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1.  Introduction 


The  most  successful  methods  in  classical  optimization  all  use 
some  information  about  second  derivatives.  First  order  methods,  like 
the  steepest  descent,  converge  at  a  linear  rate  of  convergence  which 
is  related  to  the  condition  number  of  the  Hessian  at  the  optimum,  and 
thus  convergence  can  be  very  slow  if  the  Hessian  is  poorly 
conditioned.  This  poor  behavior  can  be  corrected  by  applying  the 
first  order  method  in  a  space  obtained  by  a  linear  transformation  of 
the  variables;  classical  analysis  clearly  states  that  the  best  linear 
transformation  (unique  up  to  an  equivalence  class)  is  the  square  root 
of  the  Hessian  at  the  optimum.  This  linear  transformation  leads  to  a 
transformed  function  whose  Hessian  at  the  optimum  is  a  unit  matrix, 
and  thus  superlinear,  or  quadratic,  convergence  ensues.  Translated  in 
terms  of  the  original  variable,  this  leads  to  a  conceptual  Newton 
method,  which  can  be  approximated  in  an  implementable  way,  by  Newton 
or  quasi-Newton  methods;  both  of  these  procedures,  using  explicitly, 
or  implicitly,  second  order  information,  tend  to  approximate, 
iteratively,  the  Hessian,  or  the  inverse  Hessian,  at  the  optimum,  and, 
in  doing  so,  preserve,  to  some  extent,  the  superlinear  or  quadratic 
convergence  rate  of  the  conceptual  method. 

The  extension  of  these  ideas  to  the  problem  of  minimizing  a 
nondifferentiable  convex  function,  or,  somewhat  equivalently,  the 
problem  of  solving  a  system  of  linear  inequalities  (which  includes  the 
general  linear  programming  problem),  runs  in  the  difficulty  that  the 
concept  of  Hessian  at  the  optimum  does  not  exist.  j~\  f  j  I 
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The  ellipsoid  method  was  introduced  by  Shor  [32]  as  an  attempt  to 
reproduce,  as  well  as  possible,  the  behavior  of  classical  quasi-Newton 
methods;  Yudin  and  Nemirovski  [36,37]  showed  that  the  ellipsoid  method 
converges,  on  any  convex  function,  at  a  rate  which  depends  only  upon 
the  dimension  of  the  space,  but  not  on  the  specific  function  (the  rate 
being  approximately  1  -  (1/2)n  ),  and  furthermore  that  nothing  much 

better  can  be  expected  from  any  algorithm  which  uses  only  information 
given  by  an  oracle,  which  speaks  only  the  function  value  and  one  sub¬ 
gradient  at  every  consultation  (and  thus  superlinear  convergence  is 
ruled  out).  Khacian  [24,25]  showed  that  in  the  case  of  a  system  of 
linear  inequalities  the  ellipsoid  method  will  find  a  solution  in 
polynomial  time  (using  the  meaning  of  that  word  given  in  the  theory  of 
computational  complexity,  i.e.,  polynomial  in  the  length  of  the  input 
data,  and  not  in  the  size  of  the  problem).  The  method  has  been 
studied  further,  and  improved,  by,  among  others,  Akgul  [2],  Aspvall 
and  Stone  [3],  Bland,  Goldfarb  and  Todd  [6],  Cacs  and  Lovasz  [11], 
Goldfarb  and  Todd  [17],  Grotschel,  Lovasz  and  Schrijver  [18]  and  in 
[16]. 

In  this  paper,  we  will  study  the  conceptual  method  which  the 
ellipsoid  method  implements  in  an  approximate  way,  while  this  last 
statement  will  be  Justified  in  a  sequel;  all  of  this  will  be  done  only 
for  the  problem  of  solving  a  system  of  linear  inequalities. 


In  section  2  we  describe  the  first  order  method  which  is  used, 
the  maximal  distance  relaxation  method  of  Agmon  [1]  and  Motzkin  and 
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Schoeberg  [29],  while  also  reproducing  the  theory  of  its  convergence 

which  involves  critically  the  condition  number  p*  (whose  definition 

is  mostly  algebraic) ;  the  convergence  theory  says  that  convergence  is 
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linear,  at  a  rate  given  by  (1-p*  )  ,  which  may  be  very  slow!  By 

applying  the  algorithm  in  a  transformed  space,  and  translating  it  in 
terms  of  the  original  variables,  one  defines  a  variable  metric,  maxi¬ 
mal  ellipsoidal  distance,  relaxation  method;  the  convergence  rate  is 
given  by  the  equivalent  of  p*  in  this  transformed  space,  and  thus  it 
can  be  expected  that  if  the  linear  transformantion  is  well  chosen, 
convergence  will  be  improved. 

In  section  3,  two  other  "condition  numbers,"  which  have  a  purely 
geometrical  definition,  are  described:  v,  where  sin-1  v  measures 
the  angles  of  the  feasible  set,  and  o,  the  asphericity  of  the  feasi¬ 
ble  set.  Various  relationships  between  p*,  v  and  a  are  given,  and 
their  behavior  under  perturbations  of  the  feasible  set  and  under  lin¬ 
ear  transformations  is  investigated.  This  requires  a  rather  detailed 
study  of  the  behavior  of  the  face  lattice  (and  related  lattices)  of 
the  feasible  set,  when  perturbations  and  linear  transformations  are 
introduced.  A  concept  of  nondegeneracy,  which  slightly  differs  from 
the  usual  definitions  is  defined,  and  it  is  shown  that  all  but  a 
finite  number  of  perturbations  of  the  feasible  set  are  nondegenerate. 
The  section  ends  by  showing  that  p*  may  be  bounded  below  by  the 
inverse  of  the  asphericity  of  a  compact  and  full  dimensional  poly¬ 
hedron,  which  is  obtained  from  the  feasible  set  by  perturbation  and 
compact  if icat ion;  this  will  permit  the  geometrical  results  of 
section  6  to  always  be  applicable. 
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In  section. 4,  a  termination  routine  is  given,  which  allows  the 
usually  infinitely  convergent  relaxation  method  to  terminate;  it  is  a 
classical  projection  method  (in  the  transformed  space)  whose  conver¬ 
gence  is  not  impaired  by  the  type  of  degeneracy  defined  in  section  3. 
As  it  is  essentially  a  projected  inverse  "Hessian"  method,  it  permits 
the  variable  metric  information  to  be  used  at  the  level  of  the  termi¬ 
nation  routine,  which  might  be  a  useful  feature. 

In  section  5,  under  the  assumption  that  the  data  is  integer,  the 
issue  of  the  representation  of  all  computed  numbers  by  the  ratio  of 
polynomial  space  integers  is  treated,  so  as  to  be  ignored  later.  A 
priori  estimates  of  the  various  quantities  used  in  the  method  are 
given,  so  as  to  make  the  whole  algorithm  implementable. 

In  section  6,  the  key  geometrical  result,  due  to  Dohn  [23],  is 
discussed  and  proved:  for  any  compact,  convex  set  with  an  interior, 
there  always  exists  an  affine  transform  of  this  set,  whose  asphericity 
is  at  most  the  dimension  of  the  space;  it  is  also  shown  that  the 
linear  map  defined  through  the  largest  ellipsoid  inscribed  in  the  set 
(or  the  smallest  ellipsoid  circumscribed  around  it)  have  that  pro¬ 
perty.  Thus  the  ellipsoid  matrix  corresponding  to  the  largest 
ellipsoid  inscribed  in  a  polyhedron  (or  in  a  convex  set)  may  be  viewed 
as  a  natural  extension  of  the  Hessian;  a  characterization  is  given, 
which  shows  that  the  inverse  "Hessian"  is  a  positive  linear  combina¬ 
tion  of  symmetric  rank  one  matrices  build  upon  the  normals  to  the 
facets  of  the  polyhedron.  It  also  reduces  to  the  usual  concept  of  an 
inverse  when  the  set  is  an  ellipsoid,  or  a  parallelotope. 
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In  section  7,  we  thus  show  that  every  system  of  linear  inequali¬ 
ties  may  be  solved  in  polynomial  time  and  space  by  the  variable 
metric,  maximal  ellipsoidal  distance,  relaxation  method,  using  an 
integer  (scaled)  inverse  "Hessian"  (where  the  integers  are  polynomial 
space).  The  length  of  the  input  data  is  denoted  by  L,  and  will  never 
be  used,  but  often  referred  to. 


gr-ara.  a 


2.  Relaxation  methods  for  systems  of  linear  inequalities 

Let  Ax  _<  b  be  a  system  of  linear  inequalities,  where  A  c  Rm,n 
(or  sometimes  Zm,n,  where  Z  is  the  set  of  integers),  x  c  Rm  and 
b  €  Rm  (or  sometimes  Zm). 

This  system  may  also  be  written  as 

(a^x)  <  bA  ,  i  c  M  (LI), 

where  a*  =  A*e*,  b  e^b  (e*  is  the  ith  column  of  the  identity 
matrix  of  dimension  m,  M  =  {1,  2,  ...,  m>,  (•»•)  is  the  scalar 
product  and  t  means  transpose). 

It  will  be  assumed  that  no  row  of  A  is  identically  zero. 

The  solution  set  of  (LI)  is  a  polyhedron  P: 

P  =  (x  e  Rn:  Ax  <_  b) 

=  (x  e  Rn:  ((aX,x)  -  b^/la1!  <  0,  i  e  H)  , 

where  I  I  is  the  Euclidean  norm. 

The  problem  is  to  find  a  point  satisfying  all  the  inequalities, 
or  to  decide  that  no  such  point  exists.  The  algorithm  used  is  the 
maximal  distance  relaxation  method  of  Aqmon  [1],  with  an  n  step 
termination  routine  added  to  it. 

Algorithm  1 ;  the  maximal  distance  relaxation  method. 

1.  x  =  0,  q  =  0; 

2.  select  any  i  e  I(x),  where 

I(x)  =  (i  e  M:  (a^x-b.  )/la* I  =  max  ((a^x  -  b.)/la^l)> 

JcH  J 
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3. 
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3.1  if  (a*,x)-b^  £  0,  stop:  P  is  feasible  and  x  e  P; 

?  other  termination  criteria? 

4.  x+  =  x  -  ((ai,x)-bi)  ai/lail2. 

5.  q  «•  q+1,  x  «-  x+,  go  to  2. 

A  notation  indicating  the  iteration  count  (x^,  i^)  will  be  used 
only  when  unavoidable.  In  step  4,  a  relaxation  parameter  may  be 
introduced  [29,13,14],  but  it  does  not  affect  the  theory  given  here, 
even  though,  in  practice,  it  seems  to  significantly  improve  the 
convergence  of  the  algorithm. 

The  ?  other  termination  criteria?  will  be  specified  later*  but 
they  consist  of: 

3.2.  if  (a*,x)-b^  >0  is  small,  P  is  feasible  and  go  to  a 
(n  step)  termination  routine  to  find  x'  e  P. 

3.3.  if  q  is  large,  stop:  P  is  empty. 

If  we  define 

f(x)  =  max  {( (a*,x)-b . )/ la* I)  , 
i  e  M 

.q 

then  a1  /la1  I  e  bf(x^),  the  subdifferential  of  f  at  x*^.  If  f(x) 
is  positive,  then  it  is  the  maximal  distance  from  x  to  any 
hyperplane  representing  a  violated  constraint;  if  f(x)  is  negative, 
then  -f (x)  Is  the  radius  of  the  largest  sphere  centered  at  x  and 
contained  in  P. 
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The  theory  of  convergence  of  algorithm  1  will  use  the  properties 
of  f  but  also  of  the  function  gs 

min{lx-x'»:  x'  e  P}  =  sup{r  >_  0:  (x+rS)  n  P  =  4>),  if  x  l  P 

g(x)  = 

-sup{r  2  0:  x+rS  c  P}  ,  if  x  c  P 

where  S  =  {x  e  Rn:  Ixl  <  1>  is  the  unit  ball. 

The  condition  number  p*(P)  is  defined  by  p*(P)  =  Inf {f (x)/g(x) : 
x  i  P);  it  is  well  known  that  p*(P)  «;  (0,1]  (see  Agmon  [1],  Hoffman 
[21],  Todd  [33],  and  [13,14]).  It  should  be  emphasized  that  p*(P) 
is  not  a  function  of  P,  as  a  geometrical  object,  but  depends  on  the 
representation  of  P  by  a  specific  system  of  inequalities. 

Lemma  2. 1 

The  functions  f  and  g  are  convex,  and 

f(x)  =  g(x)  ,  if  x  e  P 

P*(P)  g(x)  ^  f(x)  <  <?(x)  ,  if  x  t  P  , 
where  both  bounds  are  reached. 

Proof . 

The  convexity  of  f  follows  as  f  is  a  maximum  of  linear 
functions.  That  of  g  follows  from  geometrical  results  given  in 
Hadwiger  [20,  pp.  149-150],  which  imply  that  the  array  of  outer 
(w  >  0)  and  inner  (w  <  0)  parallel  sets  of  P 
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w  +Y  =  [x  e  Rn:  g(x)  £  w} 

is  a  concave  array.  This  means  that  Y  =  {(x,w)  c  Rn+^:  x  e  Y^}  is  a 
convex  set;  but  as  Y  =  {(x,w)  e  Rn+^:  g(x)  _<  w},  it  follows  that  g 
is  convex. 

The  facts  that  f  j<  g  and  that  both  bounds  are  tight  are  easy  to 
show  [14].  QED 

Theorem  2.2 

The  maximal  distance  relaxation  method  applied  to  the  system  of 
inequalities  (LI),  assumed  to  be  consistent,  generates  a  sequence  of 
iterates  which  converges  finitely  or  infinitely  to  a  point  x*  which 
solves  (LI).  Furthermore: 

lxq-x*l  £  2g(x°)  8q 
g(xq)  _<  0g(xq-1 )  <  0q  g(x°) 
f(xq)  £  (p*(P))_1  9q  f(x°) 

where  0  =  (1-p*(P))^)^. 

Proof:  See  Agmon  [1],  or  [13,  14], 

A  rough  sketch  of  the  key  part  of  the  proof  goes  as 
(x  +  g(x)S)  n  P  =  (x>,  where  x  is  the  closest  point  to 
now  x,  x+  and  x  define  a  triangle  which  is  obtuse  at 


follows: 

x  in  P; 

x  ,  and  thus 
+ 
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m 


~  2  ~2  22  2  22 
Bx+txI!  Ix-xl  -  #x-x  +  B  =  g  (x)-f  (x)  <_  0  g  (x)  . 

Hence, 

(x  +  9g(x)S)  n  P  3  (x  +  +llx  +  -xllS)  n  P  =  {x>  , 
and  g(x+)  6g(x).  QED 

The  method  has  the  property  that  the  sequence  of  iterates  is 
Fejdr-monotone  [29],  i.e., 

lx1**  -  x *  B  <  Bx^  -  x'l)  ,  for  all  x'  e  P  . 

Theorem  2.2  is  valid  if  a  relaxation  parameter  p  e  (0,2)  is 
introduced  in  step  4  of  algorithm  1,  provided  that  9  is  defined  by 

6  =  A  -  p(2-p)  (p*(P))2 

[13,14];  the  introduction  of  a  relaxation  parameter  greater  than  one 
seems,  In  practice,  to  significantly  improve  the  convergence  of  the 
algorithm,  but  it  still  may  be  excrutiatingly  slow,  or  non-polynomial 
[15,33]. 

It  appears  sensible  to  expect  that  a  well  chosen  linear 
transformation  of  the  space  may  improve  the  rate  of  convergence  of 
algorithm  1. 

Let  x  =  Ty,  where  T  is  a  nonsingular  linear  transformation; 
then  (LI)  becomes 
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IM 


•  *4> waW^»* -*&**fl»*A*i*^^  AH >:*i Mi- 


(a1,  Ty)  <  bj,  i  e  M  , 

or 

<tV,  y)  <  bt  ,  i  €  M  . 

_1 

The  solution  set  of  this  system  of  linear  inequalities  is  T  P. 
The  maximal  distance  relaxation  method  applied  to  this  transformed 
problem  is  given  below. 


2.  select  any  1  e  I(y,T),  where 


I(y,T)  =  (i  £  H:  ((tV,  y)  -  b^/nVi 

=  Max  ( ( ( T^ a^  ,  y)  -  b- )/ IT1" a^  I )  >  . 
1  £  M  J 


3.1.  If  (^a*,  y)  -  b^  0,  stop: 

-1  -1 
T  P  is  feasible,  and  y  e  T  P  (also  P  is  feasible, 

and  x  =  Ty  £  P). 

?  other  termination  criteria? 

,  //ft  i  \  Tt  i/lTt  i,2 

4.  y+  =  y  -  ((T  a  ,  y)  -  b  )  T  a  / IT  a  I 

5.  q  «-  q+1,  y  *■  y  +  ,  go  to  2. 
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The  convergence  of  algorithm  1*  is  given  by  a  theorem  analogous 


to  Theorem  2.2,  but  which  uses  the  functions: 

uV.y)  -  b 

1.  f(y,T)  =  max  - r-i -  J 

leH  1 1  a  i 

clearly  (T  a  )/IT  a  I  e  8f(yq,T). 

2. 

g(y»T) 

min{ly-y'l:  y'  «  T  V}  =  sup{r  0:  (y+rS)  n  T~V  *  $} , 

if  y  i  T-1p 

-sup{r  >_  0:  (y+rS)  c  T’V)  ,  if  y  e  T~V. 

If  one  defines 


p*(T_1P)  =  Inf{f(y,T)/g(y,T) :  y  t  T'V)  , 

then  Lemma  2.1  and  Theorem  2.2  apply.  One  should  note  that  f(y,T) 
is  not  equal  to  f(Ty)  because  of  the  normalizations  used  in  defining 
f(x)  and  f(y,T). 

Algorithm  1'  may  be  expressed  in  terms  of  the  original  variable 
x,  and  leads  to  a  (fixed)  variable  metric,  maximal  ellipsoidal 
distance,  relaxation  method. 


2.7 


Algorithm  2 

Define  H  =  TTt 

1.  x  =  0,  q  =  0. 

2.  select  any  i  e  I(x,T),  where 

I(x,T)  =  U  e  H:  ((a^x)  -  b1)/(attHa1)1/2 

=  Max  (((aJ,x)  -  b ,)/(aJW) 1/2} 
J  £  H  3 


3. 

3.1. 

If  (a*,x)  -  bj  0,  stop:  P  is  feasible  and  x  e  P; 

?other  termination  criteria? 

4.  x+  =  x  =  ((a*,x)  -  b^)  Ha^/ta^Ha*) . 

5.  q  q+1 ,  x  «-  x+,  go  to  2. 

One  could  describe  algorithm  2  as  an  ellipsoid  method,  with  a 
fixed  ellipsoid. 

It  is  clear  that,  if,  In  steps  2  of  algorithms  V  and  2,  ties  are 
broken  in  the  same  fashion,  then  the  sequences  generated  satisfy 

xq  =  Tyq  ,  for  all  q; 

and  thus  Lemma  2.1  and  Theorem  2.2  can  be  adapted  to  give  a 
convergence  theory  for  algorithm  2. 

Let  E  a  TS  =  {x  e  R°:  x^H  ^x  £  1}  be  "the"  ellipsoid,  and  E** 
be  its  dual  E^  =  T_tS  =  {x  e  Rn:  x*Hx  <_  1);  the  corresponding 
ellipsoidal  norms  are  given  by 
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> 


lx«E  =  Inf {r  >  0:  x  £  rE)  =  IT-1xl  =  (xST^)1^2 


lx»Ed  =  Inf {r  >  0:  x  £  rEd)  =  l^xl  =  (xtHx)1/2 


One  thus  defines  the  functions: 


(a  »x)"bi  (aixi-b1 

1.  4>(x,T)  =  Max  — 7t — .  1/?  =  Max  - - - 

i  £  M  (a1  Ha1)  '  i  £  M  la1!^ 


Y(x,T) 


Minilx-x'l^:  x'  €  P>  =  Sup{r  >  0:  (x+rE)  n  P  M) 


-Sup{r  >  0:  (x+rE)  c  p) 


if  x  4  P 

if  x  £  P  . 


The  direction  used  at  step  q  is  given  by 


where 


/( a^W  )1/2 


a*(xq,T), 


and  thus  it  is  a  subgradient  of  $  multiplied  by  the  positive 
definite  symmetric  matrix  H;  algorithm  2  is  thus  quite  similar  to  a 
Newton  method,  with  a  fixed  variable  metric,  and  where  H  plays  the 
role  of  the  inverse  Hessian. 

The  hyperplane  selected  at  step  q  is,  in  the  terminology  of  the 
ellipsoid  method  (see  Goldfarb  and  Todd  [17],  and  Bland,  Goldfarb  and 
Todd  [6]),  one  giving  the  "deepest"  cut;  it  is  also  a  (violated) 
hyperplane  most  distant  from  x  in  the  metric  I  *£.  The  next 
iterate  x+  is  thus  the  projection,  in  the  metric  •  *£,  of  x 
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on  this  hyperplane.  It  is  also  true  that  the  sequence  of  iterates 
satisfies  an  extension  of  the  Fe j6r -monotonicity: 


Ix^^-x'llg-  <  lxq-x'  lE  ,  for  all  x'  e  P  . 

Lemma  2.3 

Let  4>(x,T),  y(x,T),  f(y,T),  g(y,T),  f(x)  and  g(x)  be  defined 
as  above;  then: 

1.  *(x,T)  =  fd’V  T)  , 

y(x,T)  =  g(T-1x,  T)  . 

2.  4>  and  y  are  convex  functions  of  x. 

3.  4>(x,T)  s  y(x,T),  if  x  e  P  , 

p*(T-1P)  y(x,T)  <  *(x,T)  <  y(x,T),  if  xiP 
where  both  bounds  are  reached. 

4.  nMT’V)  =  Inf  {$(x,T)/y(x,T)  :  x  l  P> 

5. 

A“1/2(H)|f(x)|  <  j  4>(x,T)  |  <  X'1/2(H)|f(x)| 

A~ 1/2 (H) jg(x) |  <  |y(x,T)|  <  \"1/2(H) |g(x) j 

for  all  x  c  Rn 

(where  \  and  A  mean  the  smallest  and  largest  eigenvalues). 

Proof 

The  fact  that  <b(x,T)  =  f(T~^x,T)  follows  directly  from  the 
definitions  of  $(x,T)  and  f(y,T)>  y(x,T)  =  g(T  ^x,T)  Is  also  clear 
as 
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T-1(x+rE)  =  y  +  rS  ,  if  Ty  =  x  . 

Thus  2,3  and  4  are  rewritings  of  Lemma  2.1. 

And  5  follows  from: 

X1/2(H_1)  1x1  <  lxlE  <  A1/2(H_1)  Ixl  , 

\1/2(H)  Ixl  <  lxlEd  _<  A1/2(H)  Ixl  .  QED 


Theorem  2.4 

If  algorithm  2  is  applied  to  the  system  of  inequalities  (LI), 
assumed  to  be  consistent,  then  it  generates  a  sequence  of  iterates 
which  converge  finitely  or  infinitely  to  a  point  x*  which  solves 
(LI).  Furthermore, 

lxq-x  lE  _<  20q(T)  y(x°,T) 

y(xq,T)  <  0(T)  Y(xq_1,T)  <  0q(T)  Y(x°,T) 

4>(xq,T)  <  (^(T'V))'1  0q(T)  *(x°,T) 

f(xq)  <  A1/2(H)  \‘1/2(H)  (p*(T'1P))'1  0q(T)  f(x°)  , 

where 

0(T)  =  (1  -  (p*(T'1P))2)1/2  . 

Proof 

The  sequences  generated  by  algorithms  1'  and  2  satisfy  xq  =  Tyq 
(if  ties  are  broken  in  the  same  fashion);  and,  thus,  this  theorem 
simply  translates  Theorem  2.2,  using  Lemma  2.3.  QED 
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The  relationship  between  algorithms  1'  and  2  clearly  means  that 
all  the  results  about  finite  convergence  of  the  maximal  distance 
relaxation  method  (see  Motzkin  and  Schoenberg  [29],  Eaves,  [9],  Todd 
[33],  and  [13,14])  extend  to  algorithm  2. 

The  proofs  given  in  this  paper  do  not  extend  to  other 
implementations  of  the  relaxation  method,  like  the  maximal  residual 
relaxation  method;  it  is  not  clear  whether  the  results  would  extend, 
or  not. 

In  the  maximal  residual  relaxation  method,  the  only  change  from 
algorithms  1  (or  1'  and  2)  is  in  the  selection  of  a  violated 
constraint  (step  2) : 

select  i  e  I(x),  where 

I(x)  =  (i  €  M:  (a*,x)-b.  =  f(x)>  and  f(x)  =  Max  ((a*,x)-b.). 

i  c  M  1 


The  convergence  theory  [13,14]  is  based  upon  a  condition  member  p(P) 
defined  by 


p(P)  =  Inf 
x  i  P 


1  Mf_  f(x) 

i  Tim  ' 


which  can  be  related  to  p*(P)  by 


p(P)  >  p#(P)  Max  la*  I /  Min  la*  I 


i  e  M 


i  e  M 


Thus,  the  nice  behavior  of  p*  under  linear  transformations 
does  not  extend  to  p,  because  the  norms  of  the  rows  of  A  do  not 
behave  sensibly  under  linear  transformations. 
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A  minor  annoyance  where  one  studies  polynomiality  is  that  one 
should  worry  about  the  fact  that  numbers  should  be  represented  in  a 
space  polynomial  in  the  length  of  the  input.  The  function  f  clearly 
takes  irrational  values  at  points  n  which  are  rational.  If  the  data 
(A,b)  is  integer,  then  this  can  be  ignored  if  one  simply  uses  the 

A  A 

function  f(x)  in  step  2  of  algorithm  1  (or  a  function  $(x,T)  in 
algorithm  2,  if  T  is  integer): 

f (x )  =  Max  (((a1^)  -  b1 )/ 1  la1  »J  )  . 
icH  u 

Let  also  I ( x )  =  (i  c  M:  ((a*,x)-bj,  )/|_la*lJ  =  f(x)>,  and 


p(P) 


Inf 
x  7  P 


1 

gTx) 


Min  iliUisisi 
i  e  t(x)  "a  I 


It  is  easy  to  see  that,  if 


i 


a 


is  integer,  then 


la1 1  <  [jla1 1 J  JT  and  p(P)  2  P*(P )/^F  ; 


also 


| f ( x ) |  <  |f(x)|  £  |f(x)|  »  for  x  «  Rn  • 

A 

Thus,  if  in  step  2  one  selects  any  i  e  I(x)  (which  can  be  done  using 
integer  arithmetic,  if  x  is  rational)  the  convergence  theory  is  not 
affected  in  any  significant  manner. 
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The  convergence  theory  Is  easy  to  rewrite,  and  gives 


with 


f(xq)  <  7T  a^JcP))'1  f(x°)  , 
0  =  (1  -  <p(P))2)1/2  . 


The  issue  of  polynomial  space  will  be  neglected  throughout  this 

A  A  A 

paper,  but  it  is  clear  that  by  using  f(x)  (or  f(y,T)  or  $(x,T) 
if  T  is  integer)  it  can  be  taken  care  of  quite  easily. 


2.14 


3.  Behavior  of  condition  numbers  and  face  lattices  under  perturbation 
Throughout  the  remainder  of  this  paper,  the  following  assumption 
will  sometimes  be  made,  mostly  for  notational  convenience,  as  it  does 
not  imply  any  loss  in  generality. 

Assumption  3.1: 

Rank  A  =  n,  or,  equivalently,  the  null  space  of  A  reduces 
to  the  origin. 

This  means  that  if  P  is  not  empty  then  it  does  not  contain  any 
proper  subspace.  This  assumption  is  not  restrictive  because  it  is 
always  true  within  the  subspace  R{At),  the  range  of  At,  and  the 
sequence  xq  remains  in  x°  +  R(At);  the  same  holds  in  algorithms  1' 
or  2  if  R(HAt)  =  R(At) . 

If  Assumption  3.1  holds  then  any  nonempty  polyhedron  has 
vertices,  and  is  the  sum  of  a  bounded  polyhedron  and  a  pointed  cone; 
if  Assumption  3.1  does  not  hold  then  this  statement  is  true  within 
R(At). 

Define  the  family  of  perturbed  polyhedra 

Pw  =  {x  c  Rn:  (a1,x)-bJ[  <  la1  *w  ,  i  tH) 

=  (x  e  Rn:  f ( x )  <  w}  ; 
and  the  epigraph  of  f: 
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Q  =  {(x,w)  e  Rn+1:  f(x)  w} 

=  {(x,w)  e  Rn+1:  x  e  P  }  . 

n 

Clearly  P  =  Po,  and  it  should  be  noticed  that  the  definition 
of  Pw  and  Q  includes  a  normalization  of  the  inequalities.  One 
needs  to  define  other  perturbed  sets  if  other  normalizations  are  used, 

Q  =  {(x,w)  e  Rn+1  :  f(x)  _<  w> 

Q  =  {(x,w)  e  Rn+S  f(x)  £  w } 

Q(T)  =  {(x,w)  e  Rn+1:  4>(x,T)  w} 

Q(T)  =  {(x,w)  e  Rn+S  4> ( x , T)  £  w} 

^  A  A  ^  A 

and  the  corresponding  P  ,  P  ,  P  (T)  and  P  (T);  clearly  P  =  P =  P 

WWW  w  u  u 

=  P0  =  P0(T)  =  P0(T)- 

In  what  follows  we  will  study  condition  numbers  for  systems  of 

linear  inequalities,  and  how  they  vary  under  perturbations.  In  order 

to  do  so,  the  behavior  of  the  face  lattice  (and  related  lattices) 
under  perturbations  mst  be  described. 

The  lattice,  under  the  ordering  induced  by  set  inclusion,  of 
faces  of  P  will  be  denoted  by  /’(P),  where  / (P)  includes  the 
empty  set,  unless  P  is  a  cone.  To  the  face  lattice,  one  may 
associate  the  following  lattices: 
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1.  indices  <(P) 


F  -*•  1(F)  =  (i  €  M:  (ai,x)-bi  =  0,  for  all  x  c  F> 

=  (i  e  M:  (a^,x)-b^  =  0  ,  for  some  x  e  riF} 

where  ri  means  relative  interior. 

2.  tangent  cones  S(P) 

F  ■+  Cp(F)  =  (x  €  Rn :  (a^x)  <  0  ,  for  all  i  e  1(F)) 

3.  normal  cones  «(P) 

F  -*■  Np(F)  =  [Cp  (F )  ]P 

=  {y  e  Rn:  (x,y)  £0  ,  for  all  x  e  Cp(F)} 

=  {  I  X.a1:  X,  >  0,  i  e  1(F)) 

i  e  KF)  1  1 

4.  subdifferentials  </(P) 

F  ■*  »f (F)  =  A{a i/ la1! :  i  e  1(F)) 

=  bf(x)  for  some  (or  any)  x  e  riF 

and  A  denotes  convex  hull. 

Clearly  S(P)  is  isomorphic  to  /’(P),  while  «( P),  «(P)  and 
(A P)  are  isomorphic  to  one  another  and  antiisomorphic,  or  dual,  to 
/(P)  and  A(P). 

It  will  be  necessary  to  describe  the  behavior  of  these  lattices 
under  various  normalizations  of  the  defining  inequalities,  and  under 
linear  transformations. 
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The  index  lattice  is  constant  under  affine  transformations  and 
normalizations  (by  constant,  it  is  meant  that  the  lattices  are 
isomorphic,  and  that  the  objects  composing  them  are  identical). 

The  lattices  and  /  are  constant  under  normalization, 

while  affine  transformations  induce  isomorphisms. 

Normalizations  and  affine  transformations  induce  isomorphisms  of 

</. 

Difficulties  occur  at  the  level  of  the  face  lattice  of  the 
subdifferentials  5f ( F )  (the  elements  of  </(P)),  where  affine 
transformations  induce  isomorphisms,  but  normalizations  completely 
change  the  lattice  structure.  Some  key  proofs  will  need  to  operate  at 
the  level  of  the  face  lattices  of  the  subdifferential. 

The  following  lemma  will  be  used  repeatedly,  and  is  but  a 
rewriting  of  the  characterization  of  the  projection  map  on  a  convex 
set  (see  [13]). 

Lemma  3.2 

For  every  polyhedron  P  in  Rn,  both 

(riF  +  Np(F) :  F  e  /(P)}  , 
and  (F  +  riNp(F) :  F  e  /(P)> 

are  partitions  of  Rn.  Equivalently,  every  x  e  Rn  may  be  written 
uniquely  as  x  =  y+z,  where  y  and  z  belong  to  dual  elements  of 
/(P)  and  »*(P);  y  is  the  projection  of  x  on  P,  while  z  is  the 


3.4 


mm 
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outer  normal  of  the  halfspace  which  is  the  most  distant  from  x,  among 
the  family  of  halfspaces  containing  P.  If  P  is  a  cone,  then  y 
and  z  are  orthogonal. 

Proof 

Given  x  e  Rn,  let  y  be  the  projection  of  x  on  P  and 
z  =  x-y. 

For  the  first  partition,  associate  to  x  the  smallest  face  of  P 
which  contains  y;  for  the  second  partition,  if  z  =  0  associate  P 
to  x,  while  if  z  *  0  associate  to  x  the  face  of  P  which  is  the 
set  of  maximizers  of  (z,x')  for  x'  e  P.  QED 

The  rate  of  convergence  of  the  relaxation  method  is  related, 
critically,  to  the  condition  number  p*(P).  Its  definition  was  mostly 
algebraic,  but  now  more  geometric  interpretations  of  p*(P)  will  be 
given,  as  well  as  bounds  in  terms  of  purely  geometric  concepts. 

Definition  3.3 

The  asphericities  of  P,  where  P  is  assumed  bounded  and  full 
dimensional: 

a(P)  =  Inf{a  >0:  x+rScPcx+  arS} 


and 


o'(P)  =  Inf{c  >  0:  x*  +  r*S  c  P  c  x*  +  or*S,  x*  e  P*} 
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where 


r*  =  Sup{r  Os  x+rS  c  P> 

P*  =  {x  c  Rn:  x+r*S  c  P} 

=  {x  e  Rn:  f(x)  =  Max  f(x')}  • 
x’  e  Rn 

The  first  definition  is  more  natural  in  geometry,  while  the 
second  one  is  more  natural  within  the  context  of  linear  inequalities. 
It  is  clear  that  o(P)  _<  o'(P),  and  that  if  Assumption  3.1  does  not 
hold,  then  a  and  o'  could  be  defined  within  R(At). 

Definition  3.4:  The  condition  number  v  [13,14]: 

1.  For  a  cone  (say  Cp(F)),  the  following  are  equivalent 
definitions  of  v(Cp(F ) ) : 

v(Cp(F)) 

=  Sup{sin  a:  fx  c  Rn:  >  cosa}  c  CD(F)| 

nxi  ne»  —  r 

=  Inf  {sin  a:  fx  e  Rn :  2.  5*n  3  ^(F)) 

=  Min { Ig H :  g  e  6f(F)} 

=  Sup{r  2  0:  (rS)  n  df(F)  =  41} 

=  -Inf {f’(x;d):  Idl  =  1) 

=  Min{(Xt  r(F)  X)1/2/IX«1:  X  >_  0}  , 
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to 
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where  f'(x;«)  is  the  directional  derivative  of  f(x),  r(F)  is  the 
Grammian  associated  to  the  vectors  a*7la*l,  i  e  1(F),  and 

=  ^iel(F)  l^il  t^e  *-1  norm* 

Some  properties  of  v  will  be  used  repeatedly  (for  proofs,  see 
[13,14]): 

i.  v(Cp(F) )  >  0  if  and  only  if  dim  Cp(F)  =  n,  v(Cp(F))  <_  1. 

ii.  Cj  c  c^t  and  are  convex  cones,  implies  v(C^)  £v(C.,). 

iii.  v(Cp(F))  is  a  lattice  monotone  function  on  /(P),  i.e., 

F'  c  F,  F'  and  F  «  /(P)  implies  v(Cp(F'))  <  v(Cp(F)) . 

iv.  v(Cp(F))  is  a  function  of  Cp(F),  or  Np(F),  or  Bf(F)  as 
geometrical  objects. 

v.  sin  **  v(Cp(F))  could  be  called  the  angle  of  the  cone  Cp(F) 

vi.  let  e,  e',  g,  d  and  X  be  the  vectors  where  the  various 
infima  and  suprema  are  attained,  then 

g  =  -v(Cp(F) )e/lel  , 

d  =  e/ (el  =  -e'/le'l  , 

g  =  (  I  x|a  /laYj/  l  X,  . 
icI(F)  1  iel(F)  1 

2.  v(P)  =  Min{v(Cp(F) ) :  F  e  /(P),  F  *  : 

the  same  notation  is  used  as  in  1.,  and  this  is  consistent  because  v 
is  lattice  monotone  (and  thus  if  P  =  C,  a  convex  cone,  then 
v(C)  =  v(C^(L')),  where  L',  the  lineality  space  of  C,  is  the 
minimal  element  of  /(C)). 


Definition  3.3:  the  condition  numbers  p  and  p*  [1,13,14]: 

1.  p(Cp(F) )  =  Inf  Max  -if  ; 

x  e  Np(F)  1  e  1(F)  la  I  Ixl 
x  *  0 

this  definition  is  due  to  Agmon  [1],  who  showed  that  p(Cp(F)  e  (0,1], 
and  this  under  no  assumptions  whatsoever  on  P  (except  P  *  $).  It 
should  be  said  that  p  is  not  a  function  of  Cp(F)  as  a  geometrical 
object,  but  depends  on  the  actual  representation  of  Cp(F)  by  an 
Index  set  1(F);  also  p(Cp(F))  is  not  lattice  monotone  on  /(P). 

2.  p*(Cp(F) )  =  Min{p(Cp(F»)):  F*  e  /(P),  F*  =>  F>  ; 
this  is,  obviously,  lattice  monotone. 

3.  P*(P)  =  Min  (p*(Cp(F) ) :  F  e  /(P) ,  F  *  *}  ; 

=  Min{p(Cp(F) ) :  F  e  /(P),  F  *  +}  ; 

the  same  notation  has  been  used  as  in  §2,  as  it  will  be  shown  later 
that  the  two  definitions  coincide. 

The  following  lemma  gives  a  geometric  characterization  of  p*, 
which  shows  that  it  depends  on  the  lattice  of  subdifferentials  «^(P). 

Lemma  3.6 

Let  (a^,x)  -  t>i  <  0,  i  c  M,  be  a  consistent  system  of  linear 
Inequalities,  f(x)  =  Max^e^(((a^,x)  -  b^)/la*l),  and  P  =  (x  e  Rn: 
f (x)  £  0},  then 
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P*(Cp(F))  =  Sup{r  >  0:  (rS)  n  Np(F)  c  X(5f(F)  o  {0})}  . 


Proof: 

Let  r*  be  the  value  of  the  supremum,  where  r*  <  1,  as 
9f(F)  c  S. 


Denote  by  h(y;K)  =  Sup{ytx:  x  e  K}  the  support  function  of  a 
set  K;  and  by  hi (y)  the  support  function  of  ^(5f(F)  u  {0}),  i.e., 

h1(y)  =  Max(  Max  ^ ^ ^ ,  0) 
i  c  1(F)  la1! 


Now 

h(y;  (rS)  n  Np(F)) 

=  Inf (h(y1 ;  rS)  +  h( y2 ;  Np(F)):  y.,+y2  =  y>  * 

(see  Rockafellar  [30],  p.  146) 

=  Inf{rly-y2«:  y£  e  Cp(F)> 

/  |  0  if  y2  e  Cp(F) 

(as  h(y.;  rS)  =  rlyj;  h(y  ;  Np(F))  =  | 

'  1  +-  if  y2  i  Cp(F) 

=  rd(y;  Cp(F))  , 


where  d(»;»)  represents  the  distance  between  a  point  and  a  set. 

Using  the  fact  that  Ki  c  K2  (Ki  and  K2  compact  and  convex) 
if  and  only  if  h(y;  Kj)  £  h(y;  K2)  for  all  y,  one  gets 


r*  =  Sup(r  >  0:  rd(y;  Cp(F))  ^h^y),  for  all  y  e  Rn) 
=  Inf{h1(y)/d(y;  Cp(F) ) :  y  i  Cp(F)}  , 


using  the  fact  that  hi  and  d  are  zero  on  Cp(F),  and  positive 
elsewhere. 
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Using  Lemma  3.2,  every  y  may  be  written  as  y  =  yi+y2,  with 


yi  £  Np(F) ,  y2  e  Cp(F)  and  (yi,y2>  =  0;  to  every  y,  one  may 
associate,  uniquely,  a  face  F'  of  P  such  that  y2  c  riC  (F1^. 
Clearly 

(a1,y2)  =  0  ,  for  it  I(F') 

and 

(al,y2)  <  0  ,  for  i  £  1(F),  i  i  I(F') 


Thus 

r*  =  Min  Inf{(h1(y  +y -))/ly1>:  y.  e  N  (F'),  y-  *  0  , 

F1  3  F  1  c 

F  £  /(F) 

y2  c  riCp(F')}  ; 

but,  if  y1  *  0,  y1  £  Np(F'),  one  has 

Inf{(h1(y1+y2))/ly1l:  y?  £  r^ptF')} 

(  (a1 ,y.)  (ai,y2)  ) 

=  Inf  Inf  Max  < - -  +  t  — ; - > 

y2  £  riCp(F')  e  >  0  1  £  I(F)(  la1!  ly^  «ailly1l) 

ly2H=1 

(a^y.) 

=  Max{ — : - — :  i  £  I(F')}  . 

la  l ■y1 1 

Hence, 

r*  =  Min(p(Cp(F))s  F'  =  F,  F'  £  /(P)> 

=  p*(Cp(F) )  .  QED 
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Lemma  3.7 


Definition  3.8 

A  representation  of  a  polyhedron  P  by  a  system  of  Inequalities 
(a* ,x)  -  b^/Ia1!  <0,  i  e  M,  where  P  *  (x  e  Rn:  f(x)  <  0}  is 


r 


nondegenerate  if  every  face  F  of  P  is  nondegenerate,'  a  face  F  is 
nondegenerate  if  the  affine  hull  of  ftf(F)  does  not  contain  the 
origin.  Equivalent  statements  of  the  nondegeneracy  of  F  are: 

(i)  l  X.aVla1!  =  0,  y  X.  =  1  , 
i  £  1(F)  1  i  £  1(F)  1 

does  not  have  a  solution  ; 

(ii)  (a*,x)/la*l  =  1  for  all  i  £  1(F)  has  a  solution  . 

This  definition  of  degeneracy  is  implied  by  the  usual  definition 
(every  set  (a*  :  i  e  1(F))  is  linearly  independent). 

If  P  has  an  empty  interior,  then  every  face  of  P  is 
degenerate. 

The  definition  of  degeneracy  depends  upon  the  normalization 
chosen,  and  thus  it  is  quite  possible  for  P  =  (x:  f(x)  _<  0)  to  be 
nondegenerate,  while  P  =  (x:  f  (x)  £  0}  or  P  =  (x:  4>(x,T)  <_  0} 
would  be  degenerate.  It  is  invariant  under  linear  transformations, 
but  not  under  a  linear  transformation  followed  by  a  renormalization. 

Now,  using  Lemma  3.6  or  [13,14],  it  follows  that  p*(P)  v(P), 

while  a  simple  extension  of  a  proof  of  [14]  (to  the  degeneracy  as 
defined  here,  rather  than  using  the  usual  definition)  gives 
p*(P)  =  v(P)  if  P  is  nondegenerate.  It  also  follows  immediately 
from  Lemma  3.11  (but  that  is  a  very  roundabout  proof). 

We  shall  now  describe  a  classification  of  the  faces  of  the 
subdifferentials  hf(F),  where  F  is  a  face  of  P;  this  will  lead  to 
alternate  definitions  of  p*  and  v,  which  will  permit  a  study  of  the 
behavior  of  p*  and  v  under  perturbations  of  P. 
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Let  &f(F)  =  ^{aVla^l:  i  e  1(F))  be  the  subdifferential  of  f 
at  F;  denote  its  face  lattice  by  /(&f(F))  and  define  /(df(F)) 
to  be  the  associated  index  lattice  (which  to  D,  a  face  of  9f(F), 
associates  3(D)  =  {i  e  1(F):  a*/la*l  e  D}),  and  which  is  clearly 
isomorphic  to  /Of(F)).  Notice  that  1(F)  =  3(9f(F)). 

Definition  3.9 

Let  D  be  a  face  of  5f(F),  f  e  /(.P),  and  3(D)  be  the 
corresponding  index  set;  then  D  is  an  outside  face  (resp.  Inside 
face,  resp.  side  face)  of  5f(F)  if,  for  e  =  1  (resp.  e  =  -1, 
resp.  e  =  0),  there  exists  a  solution  to: 

(a1,x)/la1l  =  £  ,  it  3(D)  , 

(a1 ,x)/la*l  <  e  ,  it  1(F),  i  i  3(D)  . 

This  definition  means  that  there  exists  a  halfspace  whose 
intersection  with  df(F)  is  D,  and  such  that  this  halfspace  does  not 
contain  the  origin  (resp.  contains  the  origin  in  its  interior,  resp. 
contains  the  origin  on  its  boundary);  with  the  proviso  that  9f(F)  is 
always  defined  as  a  side  face  (with  x  =  0). 

Every  face  of  df(F)  satisfies  at  least  one  of  the  three 

definitions.  Also  5f(F)  is  an  outside  face,  if  and  only  if  it  is  an 
inside  face,  if  and  only  if  F  is  nondegenerate;  the  side  faces  of 
9f(F)  are  essentially  the  faces  of  Np(F).  The  polyhedron  P  has 
dimension  n  if  and  only  if  every,  or  any,  8f(F)  has  an  inside  face 
( see  Lemma  3.11). 
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The  definition  of  outside,  and  inside,  faces  is  "invariant"  under 
linear  transformations,  but  not  under  normalizations,  while  side  faces 
are  "invariant"  for  both. 

a* 

Similar  definitions  may  be  given  for  the  subdifferentials  of  f, 

A  A 

f,  4>,  4>. 


Lemma  3.10 

Let  5f(F)  be  the  subdifferential  of  f  at  a  face  F  of  P; 
then  any  face  0  of  5f(F)  which  is  both  an  inside  and  outside  face 
is  also  a  side  face.  If  F  is  nondegenerate,  then  D  is  inside  if 
and  only  if  it  is  outside.  A  face  F  is  nondegenerate  if  and  only  if 
every  face  D  of  df(F)  is  both  inside  and  outside. 


Proof: 

Let  D  be  a  face  of  bf(F),  which  is  both  outside  and  inside, 
and  xi  and  X2  be  the  corresponding  values  of  x  given  by  the 
definition  of  outside  and  inside  faces;  then  Xi+X2  satisfies  the 
definition  of  a  side  face,  and  hence  D  Is  a  side  face. 

Now,  assume  that  F  is  nondegenerate,  and  thus  there  exists  a  y 


such  that 


(a^yj/la1!  =  1  , 


for  all  i  c  1(F); 


if  D  is  an  outside  face,  and  n-j  is  given  by  the  definition  of  an 


outside  face,  then 


(a1,  x1-2y)/lail  =  -1  , 
(a1,  x^-ZyJ/la1!  <  -1  , 


i  e  3(D)  , 
i  £  1(F),  i  i  3(D) 
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and  thus  0  is  an  inside  face.  The  converse  statement  follows 
similarly. 

If  F  is  nondegenerate,  thens 

i)  if  0  is  an  outside  face,  it  is  also  inside,  and  side 

ii)  if  0  is  an  inside  face,  it  is  also  outside,  and  side 

iii)  if  D  is  a  side  face,  and  X3  is  given  by  the  definition  of  a 
side  face,  then  X3+y  satisfies  the  definition  of  an  outside 
face,  and  X3~y  satisfies  the  definition  of  an  inside  face; 
as  every  face  is  either  inside,  outside  or  side,  the  lemma  is  proved. 

QEO 

Lemma  3.11 

Let  df(F)  be  the  subdifferential  of  f  at  a  face  F  of  P, 

then 

p*(Cp(F))  =  Min{d(0;D):  D  is  an  outside  face  of  Sf(F)>  , 

while  if  P  has  an  interior,  then 

v(Cp(F))  =  Min{d(0;D):  D  is  an  inside  face  of  5f(F)>  , 

where  d(0;D)  is  the  distance  between  the  origin  and  the  set  D. 
Before  proving  Lemma  3.11,  two  technical  lemmas  will  be  given. 

Lemma  3.12 

Let  D  be  a  compact  polyhedron,  such  that  its  affine  hull  does 
not  contain  the  origin,  then 


where 


e 


means  cone  hull 


Proof: 

By  hypothesis,  there  exists  a  y  such  that  (y,x)  =  1,  for  all 
x  e  D.  Also 

c(D)  =  u  XD  and  A(D  u  (0>)  =  u  XU  , 

X  >  0  X  e  [0,1] 

and  thus 

D  =  e(D)  n{x:  (y,x)  =  1)  . 

A (D  u  {0})  =  «(D)  n  (x:  (y,x)  <  1/  . 

Now 

d(0;D)  =  Min{r  0: 

=  Min{r  >  0: 

=  Min{r  >  0: 

=  Sup{r  2  0: 

=  Sup{r  2  0: 

=  Supfr  2 

Lemma  3.13 

Let  df(F)  be  as  before,  then  every  x  e  /(Sf(F)  u  (0>),  x  *  0, 


(rS) 

n 

D  *  *} 

(rS) 

n 

c(D)  n 

{x  : 

(rS) 

n 

e(D)  n 

(x  : 

(rS) 

n 

«(D)  c 

(x  : 

(rS) 

n 

*  (D)  c 

(x  : 

<rS) 

n 

c(D)  C 

A{0 

y  x  =  1}  *  4>> 
ytx  2.  1 )  * 
ytx  <  1}> 
ytx  <  1}} 

u  {0})}  .  QED 


belongs  to  one  and  only  one  of  the  sets  ^((riD)  u  (0>),  where  D  is 
any  outside  face  of  8f(F). 


Proof: 


Let  D  be  any  outside  face  of  &f(F),  and  y  be  such  that: 

(y,x)  =  1  ,  x  c  D 

(y,x)  <  1  ,  x  c  &f(F),  x  t  D 

Every  face  D'  of  D  is  an  outside  face:  let  y'  satisfy 

D'  =  {x  e  D:  (y',x)  =  Max  (y',x')}  , 

x'  c  D 

the  (y+ey'J/O  +  ely'.x'))  where  x’  e  D' ,  and  e  is  positive  and 
small  enough,  shows  that  D'  satisfies  the  definition  of  an  outside 
face. 

If  x  e  D,  then  (y,Xx)  =  X.,  and  thus  Xx  t  D  if  X  *  1.  Also, 
if  X  >  1,  then  (y,Xx)  >  1  and  Xx  1 1  5f(F). 

Now,  let  x  e  /(5f(F)  u  {0}),  x  *  0;  define  x'  =  p'x  where 
p'  =  Max{p  _>  1:  px  e  bf(F)}.  Clearly  (px':  p  _>  1}  and  hf(F)  have 
disjoint  relative  interiors,  and  thus  they  can  be  (strictly) 
separated;  i.e.,  there  exists  a  z  c  Rn  such  that: 

(z,px')  >  (z , x 1 )  2  (z>x")  »  for  all  p  >  1,  x"  e  ?sf(F)  . 

Hence  (z,x')  >  0;  but  as  x'  e  bf(F),  x'  =  yicI(F)  X^/la1!,  \{  >  0, 
i  €  1(F)  and  Fic];(p)  Xj  =  1*  it  follows  that: 

I  X. [ (z,x ' )  -  (zja^/Ba1!]  =  0  . 
i  e  KF)  1 


3.17 


Using  this,  and  \1  0,  (z,x')  (z.a1)/ Ita1 1,  leads  to 

Xj[(z,x')  -  (z,a*)/la^l]  =  0,  for  all  i  e  1(F).  Hence  there  exists 
a  face  O'  of  df(F),  which  is  an  outside  face,  such  that  3(D') 

=  (i  c  1(F):  (z,a*)/la*I  =  (z,x')>,  and  furthermore  X.^  =  0 

if  i  £  1(F),  i  /t  3(0' ) ,  implying  that  x'  =  Ii£3(0i)  ^  ai/lail, 
and  x'  e  O' . 

Now  let  D"  be  the  smallest  face  of  D'  which  contains  x';  one 
has  x*  e  riD",  D"  is  an  outside  face,  and  x  £  ((riD")  u  {0}).  It 

is  also  clear  that  0"  is  unique.  QED 


Proof  of  Lemma  3.11 
One  has 


Min{d(0;D):  D  is  an  outside  face  of  bf(F)} 

=  Min  Sup{r  ^  0:  (rS)  n  e(D)  c  A( D  u  {0}))  , 
D  outside 

(by  Lemma  3.12)  , 

=  Sup{r  >  0:  (rS)  n  Np(F)  c  /,(5f(F)  u  {0})}  , 


=  p*(Cp(F)> 


(by  Lemma  3.13)  , 
(by  Lemma  3.6)  . 


Now,  it  is  clear  that  if  dim  P  <  n,  then  there  are  no  inside 
faces  and  v(Cp(F))  =  0  for  all  Ft  /(P). 

Using  the  characterization  of  v(Cp(F))  as  d(0;  bf(F)),  and 
letting  g  to  be  the  element  of  minimum  norm  in  ftf(F),  It  follows 
that  (if  dim  P  =  n): 
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•  (g,x-g)  0  ,  for  all  x  e  df(F)  , 

and  d(0;  5f(F))  =  Igl  *  0;  now  if  we  let 

D*  =  (x  £  5f(F) :  (g,x-g)  =  0}  , 

it  is  clear  (with  -g)  that  D*  is  an  inside  face  of  &f(F).  But,  as 
D  e  {x:  (g,  x-g)  0)  it  is  clear  that  Igl  =  d(0;  0*)  d(0;D)  for 

every  face  0  of  5f(F).  Hence  the  lemma.  QED 

Note  that  the  last  part  of  the  proof  implies  that  if  dim  P  =  n, 
then  every  df ( F ) ,  F  £  /(P),  has  at  least  one  inside  face. 

Theorem  3.14 

Let  (ai,x)  <  bj,  i  £  M,  be  a  consistent  system  of  linear 
inequalities,  with  solution  set  P,  then 

P*(P)  >  v(P)  >  (o(P))"1  >  (o'tP))'1  , 

where  o(P)  and  a'(P)  are  taken  as  +®  if  P  is  unbounded,  or  has 
no  interior;  furthermore  if  P  is  nondegenerate,  then  p*(P)  =  v(P). 

Proof: 

Lemma  3.11  implies  that,  for  every  F  £  /'(P),  one  has 

p*(Cp(F))  =  Min{d(0;D):  D  is  an  outside  face  of  5f(F)} 

Min{d(0;D) :  D  is  a  face  of  5f(F)> 

=  v(Cp(F));  hence  p#(P)  >  v(P)  . 
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If  F  is  nondegenerate,  then  Lemma  3.10  and  3.11  imply  that 
p*(Cp(F))  =  v(Cp(F)),  and  thus  if  P  is  nondegenerate  then 
p*(P)  =  v(P). 

From  the  definition  of  o'(P)  (or  of  o(P)),  there  exists  a 
sphere  of  some  radius  r  >  0  contained  in  P,  such  that  a  concentric 
sphere  of  radius  or  contains  P  (where  a  is  either  cr(P)  or 
cr' (P) ) ;  thus  every  tangent  cone  Cp(x),  where  x  e  boundary  P, 

contains  a  spherical  cone  of  angle  sin“^(r/or),  and  thus 

-1  -1  -1 
v(Cp(x) )  sin[sin  (r/or)3  =  a  ,  whence  v(Cp(F))  o”  ,  and 

v(P)  >  (o(P))"1  ^  (o'(P))"1  .  QED 

In  all  that  precedes,  P  should  be  viewed  as  a  representative  of 
the  class 

Pw  =  (x  e  Rn:  ((a*,x)  -  b^/la*!  £  w,  i  e  M>  , 

and  thus  every  definition,  or  result,  which  relates  to  P  also 
extends,  verbatim,  to  every  Pw.  The  behavior  of  the  various 
lattices,  and  condition  numbers,  as  w  varies  will  now  be  described. 
The  key  to  this  somewhat  exciting  study  and  munificent  enterprise  is 
given  by  the  relationship  between  the  faces  of  Pw,  and  the  faces  of 
Q  =  {(x,w)  e  Rn+1;  f(x)  <  w}. 
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Definition  3.15.;  the  vertex  level  set  W 

Define  W  =  (w.p  v^,  ...,  w^},  where  0  <  w^  <  w^  <  •••  <  w^,  as 

the  set  of  the  positive  w  coordinates  of  the  vertices  of  the  set  Q. 

A  similar  definition  of  W  can  be  given  if  Assumption  3.1  does 

#»*  A 

not  hold.  Analogous  concepts  can  also  be  defined  for  Q,  Q,  etc.  The 
level  w  =  Wq  =  0  may  or  may  not  contain  vertices  of  Q,  but  it 
always  does  if  P  is  not  full  dimensional  (under  Assumption  3.1). 

The  symbol  w_i  will  be  used,  at  times,  for  the  first  negative 
vertex  level. 

Definition  3.16 

A  face  Fq  of  Q  is  called  horizontal  if  it  is  entirely 
contained  in  a  (horizontal)  hyperplane,  w  =  constant;  if  we  let 

I(Fq)  =  (i  e  M:  ((a^jxj-bjj/la^l  =  w,  for  all  (x,w)  e  Fq} 

where  Fq  e  /(Q),  then  Fq  is  horizontal  if  and  only  if 

(ai,x)/la^l  =  1  ,  i  c  I(Fq)  , 

does  not  have  a  solution. 

Every  horizontal  face  of  Q  (which  has  positive  w  coordinates) 
is  contained  in  one  of  the  horizontal  halfspaces 

w  =  w^  ,  k=1,...,K. 

For  a  given  w,  the  faces  of  Pw  are  related  to  the  faces  of 
Q;  let  Fq  be  any  face  of  Q  such  that  its  relative  interior 


3.21 


intersects  the  hyperplane  w,  and  let  F  be  the  corresponding  face 
of  Pw  (where  I(Fq)  =  1(F)),  then  there  are  two  (disjoint) 
cases: 

i.  Fq  is  horizontal,  then  dim  F  =  dim  Fq,  and  F  is 
degenerate. 

ii.  Fq  is  not  horizontal,  then  dim  F  =  dim  Fq-1,  and  F  is 
not  degenerate. 

This  is  clear,  as  the  degeneracy  of  F  is  given  by  the  same 
condition  as  the  horizontality  of  Fq. 

Lemma  3.17 

On  each  open  interval  between  consecutive  vertex  levels  (say 
(w  ,  w  ,)),  the  lattices  <(P  ),  »(P  ),  ^(P  ),  «/(P  )  are  constant, 

K  Kt  I  W  W  W  W 

while  the  lattices  /(P  )  are  isomorphic,  and  p*(P  )  =  v(P  )  is 

w  w  w 

constant. 

The  polyhedron  Pw  is  degenerate  if  and  only  if  w  is  a 
vertex  level. 

Proof: 

Take  w  c  (w^,  w^)  (say),  then  every  face  F  of  P^  is 
generated  by  a  nonhorizontal  face  of  Fq  of  Q,  such  that  1(F)  = 
I(Fq).  The  projection  of  Fq  on  the  w  axis  contains  at  least 
the  closed  interval  [w^.,  w^],  and  thus  the  projection  of  ri  Fq 
contains  at  least  (w^,  w|<;+i). 
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Lemma  3. 18 

Let  w^  be  an  arbitrary  vertex  level  of  Q,  then  for  any 
w  e  (wk,  wk+-j)  (resp.  (wk_-|,  assuming  that  has  dimension 

n)  the  index  lattice  of  Pw  is  precisely  the  set  of  index  sets 
corresponding  to  the  outside  (resp.  inside)  faces  of  the 
subdifferentials  3f(F),  for  all  F  e/(P*k).  The  index  lattice 
of  Pw  is  also  precisely  the  set  of  index  sets  corresponding  to  the 
faces  of  df(F),  for  all  F  e  /(P w) . 


Proof: 


Let  F  be  an  arbitrary  face  of  Pw,  with  w  e  (wk,  «k+1)»  and 
1(F)  be  the  corresponding  index  set;  as  F  is  nondegenerate,  there 
exists  a  nonhorizontal  face  Fq  of  Q  such  that  I(Fq)  =  1(F). 

If  Fq  extends  below  the  level  w^,  then  the  intersection  of 


Fq  with  the  horizontal  hyperplane  w^  gives  a  face  F'  of  Pw^, 
such  that  I(F')  =  I(Fq)  =  1(F),  and  F'  is  nondegererate;  and 
bf(F')  is  an  outside  face  (of  itself)  such  that  3(df(F'))  =  1(F). 


If  Fq  stops  at  W|<,  let  F'  be  defined  as  before,  and 
choose  x'  e  rlF';  clearly  (x'jW^)  e  Fg  and  I(x')  =  I(F')  ?  I(Fg) 
=  1(F).  Let  x  e  riF,  then: 
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((ai,x')-bi)/Iail  =  wk  , 
( ( a1 ,x )  -bi)/lail  =  w  , 
((a^x)  -bi)/lail  <  w  , 


for  all  1  e  I(x')  , 


for  all  1  e  1(F)  =  I(x)  , 


for  all  i  c  I(x'),  1  t  I(x)  ; 


hence 


(a^x-x1 J/la*!  =  w-w.  >  0  ,  for  all  i  e  I ( x ) 


(a^x-x'l/la*!  <  w-wk  , 


for  all  1  e  I(x'),  i  it  I(x)  , 


and  the  consideration  of  (x-x* J/Cw-w^)  shows  that  1(F)  is  the 
index  set  of  an  outside  face  of  8f(F'),  where  F'  is  a  face  of  PWk 
For  the  converse,  let  3  =  3(D)  be  the  index  set  of  an  outside 
face  D  of  a  subdifferential  bf(F'),  where  F'  is  a  face  of  PWk* 
Let  x'  e  riF',  and  y  be  such  that 


(ai,y)/«ail  =  1  ,  i  e  3  , 

(ai,y)/lail  <  1  ,  i  e  I(F'),  i  i  3  ; 


for  e  small  enough  (and  positive)  I(x+ey)  =  3  and  x+ey  e  PWk+e. 
Now  take  F  to  be  the  unique  face  of  PWk+e  such  that  x+ey  e  riF, 
then  1(F)  =  I(x+ey)  =  3.  And  Lemma  3.17  thus  implies  that  there 
exists  a  face  F  e  /(P  ),  for  all  w  e  (w.  ,  w.  .)  such  that 

W  W  K  K+  I 

KFW)  =  3. 

The  proof  for  inside  faces  is  similar. 

The  last  statement  of  the  lemma  says  that  if  F  is  a  nondegener 
ate  face  of  Pw,  and  D  is  a  face  of  8f(F),  then  D  is  a  side 
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face  of  df(F),and,  thus,  there  exists  a  face  F'  of  P  such  that 

w 

D  =  bf(F').  QED 

Theorem  3.19 

Let  (ai,x)  <_  bj,  i  e  M,  be  a  system  of  linear  inequalities, 

and 

P  =  {x  e  Rn:  ((a^x)  -  b.J/la*!  <  w,  i  e  H}  , 

be  the  family  of  perturbed  polyhedra;  then  the  condition  numbers 
P*(PW)  and  v(Pw)  have  the  following  properties: 

1.  they  are  nondecreasing  functions  of  w; 

2.  they  are  constant  and  equal  between  consecutive  vertex 
levels; 

3.  at  every  vertex  level  wk,  v  is  left  continuous  (or  left 
constant)  while  p*  is  right  continuous  (or  right  constant) 
and  P*(PWk)  >.  v(P*k). 

The  asphericities  o'(Pw)  and  o(Pw)  are  decreasing 
functions  of  w  and 

P*(PJ  >  v(P  )  >  (o(P jr1  >  (o’fP))'1  . 

WWW  w 


Proof: 

Property  2  was  given  earlier,  as  Lemma  3.17,  while  property  1 
follows  from  2  and  3.  So  one  only  needs  to  show  property  3;  Lemma 
3.11  shows  that 


3.25 


«i*(Pw)  =  Min{d(0;0):  D  is  an  outside  face  of  some  df(F), 

F  *  /(Pw))  , 

and  thus  Lemma  3.18  gives 

P*(pw^)  =  P*(pw)  *  for  a11  w  «  [v*k»  wk+1)  • 

Similarly  v(P^)  =  v(Pw)  for  a*1  *»  «  (\-l»  also  Theorem  3.14 

shows  that  g*(Pw^)  v(Pw^)  and  thus  property  3  is  proved. 

Now,  for  the  asphericities,  we  will  prove  that  o' (Pw)  is  a 

decreasing  function  of  w  (a  similar  proof  works  for  o(Pw)). 

By  definition  of  o'(Pw),  there  exists  an  x*  e  P*  =  {x:  f(x) 

=  Miny  f (y ) }  and  r*  =  -f(x*)  such  that 

x*  +  (r*+w)S  c  P  c  x  +  (r*+w)  o'(P  )S  ; 

w  w 

note  that  x*  +  (r*+w)S,  for  x*  c  P*,  are  all  the  largest  spheres 
contained  in  Pw  (and  this  for  any  w). 

Now,  using  the  definition  of  t  * rpw)  given  in  §2,  one  has,  for 
any  v  >  w, 

pv  c  Pw  +  (P*(pw))_1  (v-w)S  , 
and 

x*  +  (r*+v)S  c  pv  c  pw  +  (p*(pw))"1  (v-w)S 

c  x*  4  [ (r*4w)  o'(Pw)  4  (p*(Pw))_1  (v-w)]S 
c  x*  4  (r*4v)  o'(P^)S  (using  Theorem  3.14)  ; 

and  thus  o'(Pv)  _<  o'(Pw).  QED 


3.26 


This  theorem  essentially  means  that  condition  numbers,  and 
asphericities,  improve  as  the  perturbation  w  increases;  this 
critically  depends  upon  the  assumption  that  the  inequalities  have  been 
normalized  before  being  perturbed,  and  it  is  not  true  if  this 
assumption  is  not  made.  The  normalization  assumption  also  implies 
that  every  n  dimensional  face  of  Q  extends  up  to  w  =  +®,  and  that 
every  singleton  {i>,  1  €  M,  belongs  to  «(Q)  (if  one  assumes  that 
no  rows  of  A  are  positively  proportional). 

Theorem  3.19  will  not  be  of  much  use  unless  a  compactification 
scheme  is  introduced,  which  makes  the  asphericities  finite. 

Compactification  scheme  3.20 

Define  by  c(w)  (and  c  =  c(0))  a  positive  number  such  that  the 
cube  (x:  «xl„^  c(w)>  contains  a  point  in  the  relative  interior  of 

every  face  of  P  .  Then  let  PC  =  (x  e  P  :  llxl  <  c(w)};  one  clearly 
has  the  fact  that 

p*(Pw)  >  v(Pw)  >  v(P°)  >  (o(P^))'1  >  (o' (p£))-1  , 

0 

where  now  o'(P  )  c  [1,  +»). 
w 

This  leads  to  the  main  result  of  this  section,  which  says  that, 
for  any  consistent  system  of  linear  inequalities  with  solution  set  P, 
the  condition  number  p*(P),  which  measures  the  convergence  rate  of 
the  maximal  distance  method,  is  greater  than  the  asphericity  of  a  full 
dimensional  and  bounded  set  which  is  given  by  a  perturbation  cum 
compactification  of  the  set  P. 


.  r-Mtt:  WM m 


Theorem  3.21 


Let  (a*,x)  £  b£,  i  e  M,  be  any  consistent  system  of  linear 
inequalities,  with  solution  set  P,  and  iet 

Pw  =  {x  e  Rns  <(ai,x)-bjL)/lail  <  w,  i  e  M}  , 


where  w  e  (0,w^),  and 


Pc  =  P  n  {x  e  R  :  Ixl  <  c(w)>  , 
w  w  •  —  ’ 


which  is  always  bounded  and  full  dimensional,  then 


H*(P)  =  v(P)  >  v(P°)  >  (o(P°))_1  >  (a'(p^))"1 

W  n  ”  n  “  n 


4.  Finite  termination 

The  relaxation  methods  given  by  Algorithms  1  and  2  are  not,  in 
general,  finitely  convergent  procedures.  It  has  been  shown  by  Motzkin 
and  Schoenberg  [29],  Eaves  [9]  and  in  [13,14]  that,  if  dim  P  =  n, 
then  some  values  of  the  relaxation  parameter  (including  the  value  of 
2)  lead  to  finite  convergence;  no  bound  on  the  number  of  iterations 
has  been  proved,  and  thus  those  results  do  not  permit  a  discussion  of 
polynomiality. 

It  will  thus  be  necessary  to  stop  Algorithms  1  and  2  after  a 
finite  number  of  steps,  and  either  decide  that  P  is  empty,  or  go  to 
a  termination  routine  which  will  identify  a  point  x  in  P. 

If  wi  is  the  first  positive  vertex  level  of  Q,  then,  if  there 
exists  an  x  such  that  f(x)  <  wj ,  there  also  exists  an  x'  such 
that  f(x')  £0,  and  thus  P  is  feasible;  this  is  a  geometric 
interpretation  of  results  given  in  Chernikov  [35]  and  Khacian 
[24,25].  Similar  statements  with  f  and  w*  and  the  other  related 
functions  are  of  course  valid. 

It  should  be  apparent  that  it  is  possible  to  design  an  n  step 
termination  procedure,  which  is  a  descent  method,  starting  from  a 
point  x  such  that  f(x)  £  wi ,  which  will  follow  faces  of  decreasing 
dimension;  if  the  faces  used  decrease  by  one  dimension  at  each 
iteration,  then,  after  n  iterations,  it  must  be  a  vertex  x',  such 
that  f(x')  <  wj,  and  thus  f(x')  £  0. 

Thus,  we  will  now  specify  ?other  termination  criteria?  for 
algorithm  1;  something  similar  applies  to  Algorithms  1'  and  2. 


3.2.  if  f(x)  <  wi ,  go  to 


termination  routine 


3.3.  if  q  >  [log(f(x°)/wip*(P)))]/log(1  -  (p*(P))2)  ^2,  stop,  P  is 
infeasible. 

The  stop  in  step  3.3  is  correct,  because  if  q  is  as  above,  and 
if  P  were  feasible,  then  by  Theorem  2.2,  fCx^)  <  wi .  This  part  of 
the  algorithm  is  not  quite  implementable  unless  lower  bounds  on  wi 
and  p*(P)  can  be  given,  and  this  requires  the  assumption  that  the 
data  is  integer  (see  §5). 

If  the  switch  to  the  termination  routine  is  based  upon  an 
incorrect  value  of  wi ,  then  the  termination  routine  will  end  at  a 
vertex  of  Q,  which  may  not  be  a  point  of  P;  it  is  clear  that  the 
relaxation  method  (or  the  simplex  method)  could  be  restarted  from  that 
point. 

In  the  study  of  the  ellipsoid  method,  various  methods  have  been 
suggested,  in  order  to  find  an  exact  solution  from  an  approximate  one; 
some  proposals  involved  solving  m  systems  using  Algorithm  1  (Aspvall 
and  Stone  [3],  Khacian  [25]),  or  used  rationality  arguments  and 
continued  fractions  (Bland,  Goldfarb  and  Todd  [6]). 

The  termination  procedure  given  here  follows  more  closely  the 
ones  given  by  Goldfarb  and  Todd  [17],  and  Akgul  [2],  and  also  is  in 
the  spirit  of  a  proof  given  by  Gacs  and  Lovasz  [11]. 


4.2 


It  is  essentially  a  projection  method,  or  a  rank  deflating 
ellipsoid  method  [16],  and  is  similar  to  the  algorithms  of  Lemke  [27], 
Rosen  [31],  Zoutendijk  [39,40],  Gill  and  Murray  [12],  Cline  [7]  and 
Bartels,  Conn  and  Charalambous  [4],  but  with  special  attention  being 
paid  to  the  kind  of  degeneracy  which  has  been  defined  in  Section  3. 

In  order  to  keep  the  notation  a  bit  less  cumbersome,  the  termina¬ 
tion  routine  will  be  described  using  the  function  f,  but  it  is  easy 

A 

to  translate  the  routine  in  terms  of  the  other  functions  f,  f,  $, 

A 

4>,  etc. 

Termination  routine  4.1 

1.  Set  K  =  K0,  a  positive  definite  symmetric  matrix  (possibly  an 
identity,  or  the  matrix  H  of  Algorithm  2), 

x  =  x^  (where  f(x^)  <  w^) 

z  =  0  ,  p  =  0  . 

2.  ?  is  (a^jz)  =  1  for  all  i  €  I(x),  where  I(x)  =  {i  e  M:  (a^jx) 
-  bl  =  f(x)>? 

if  yes,  go  to  4  (linesearch) 
if  no,  go  to  3  (update) 

(an  alternative  test  would  be 
?  is  (ai,z)  >  1  for  all  i  e  I(x)?) 


a 


Update 

Select  i  e  I(x)  such  that  (a*,z)  *  1  (or  (a*,z)  <  1  for  the 
alternative  test  of  step  2),  always  one  has  Ka*  *  0;  set 

z+  =  z  +  (1  -  (a*,z))  Ka^/(a^Ka^) 

*  „  KaialtK 

+  "  ‘  it,.  1 

a  Ka 


set  z  ♦  z+,  K  *■  K+,  p  «-  p+1,  and  go  to  step  2, 


4.  linesearch 


&  =  Max{6':  f(x)  -  f(x-6'z)  =  6') 

=  Max{(f(x)  -  (a^,x)  +  bj)/(1  -  (a^,z)):  j  c  (i  e  M:  (a^z)  <  1>} 


(note  that  6  >  0), 

If  6  =  +“>,  Min{f(y):  y  c  Rn)  =  -<*>,  and  x'  =  x  -  6'z  solves  P 
for  any  6'  f  (x) , 

set  x+  =  x-6z,  if  f(x+)  £  0,  stop:  x+  e  P, 
otherwise,  set  x  x+  and  go  to  2. 


Theorem  4.2 

*v  *+* 

If  the  assumption  that  f(xo)  <  wi  is  correct,  then  the 
termination  routine  stops  after  at  most  n  steps,  with  a  point  x 
which  solves  P;  if  the  assumption  is  incorrect,  then  the  routine  may 
"fail"  in  step  3,  with  Ka*  =  0,  and  this  implies  that  x  belongs 
to  a  horizontal  face  of  Q. 


4.4 


Proof: 


Let  Ap  =  (a*°,  a*\  . ..,  aip”^)t,  then  it  has  been  shown  in  [16] 


that 


K  =  K  -  K  Al(A  K  At)'1  A  K  , 
p  o  oppop  po 


A  K  =  0,  rank  A  =  p,  rank  K  =  n-p,  and  also  that  K  a  =  0  if 
P  P  P  P  P 

and  only  if  a  =  A*"X. 

P 

Also,  by  induction  of  p,  one  has  A  z  =  1  (a  p  dimensional 

p  p  p 

vector  of  ones)  and  z  =  K  A^(A  K  A^)  ^  1  ;  in  fact  z  is  the 

p  oppop  p’  p 

unique  solution  of  Apzp  =  1p  which  belongs  to  R(KQAp)  .  Also,  by 

induction,  Ax  -  b(p)  =  f(x  )  1  (where  b(p)  =  (b  ,  ...,  b.  ,)*"), 
P  P  P  P  o  ,p-1 

and  thus  I(xp)  3  (i°»  •••»  i*3- ^ } ;  if  one  denotes  by  F^(xP)  the  face 
of  Q  such  that  I(Fp(xq))  =  I(xP)  (clearly  the  smallest  face  of  Q 
containing  (xP,  f(xp))),  then  dim  Fq(xP)  £  n-p. 

Every  a  e  Rn  may  be  decomposed  uniquely  as  a  =  ApX  +  aN,  where 


a.,  e  N(A  K  )  (where  N  means  null  space);  hence  atz  =  X^l  and 
N  p  o  p  p 

K  a  =  K  a..  (note  that  N(K  )  =  R(At) ) . 

P  P  N  p  p 

Thus  failure  will  occur  in  step  3  if  and  only  if  (ai,Zp)  *  1 
and  K  a.  =  0;  this  is  true  if  and  only  if  a*  -  A*"X  =  0  and 
1-Xt1p  *  0,  which  Implies  (by  Definition  3.16  and  a  theorem  of  the 
alternative)  that  Fq(xP)  is  horizonal.  Note  that  if  the  first 
alternative  for  step  2  is  used,  then  failure  must  occur  at  a 
horizontal  face,  while  this  need  not  be  the  case  with  the  second 
alternative. 


4.5 


Now  if  f(xP)  c  (0,w^),  as  the  linesearch  stops  the  algorithm 
if  f(xp)  £  0,  it  follows  that  in  steps  2  and  3  f(xp)  c  (0,w^)  and 
hence  Fq(xP)  is  a  nonhorizontal  face,  and  thus  failure  cannot 
occur  in  Step  3.  After  at  most  n  steps  Fg(xn)  has  dimension 
zero,  and  thus  is  a  vertex  of  0,  implying  that  f(xn)  _<  0;  thus  the 
algorithm  stops  after  p  steps  (p  <_  n)  in  a  point  xP  which 
belongs  to  P.  If  the  algorithm  takes  the  full  n  steps,  then  it 
terminates  at  a  vertex  of  Q,  and  Kn_i  =0. 

If  f(xp)  was  not  smaller  than  w^,  then  failure  may  occur  in 
step  3,  for  some  p  <  n,  but  then  Fq(xP)  is  a  horizontal  face  of 
Q.  QED 


If  rank  A  <  n,  then  the  termination  routine  stops  after  at  most 
rank  A  steps,  and  it  is  probably  more  natural  to  restrict  K0  to 
satisfy  R(K0At)  =  R(At)  (so  that  xP  e  x°  +  R(Afc)). 

The  termination  routine  may  be  amended  so  that  when  "fiilure" 


occurs  at  iteration  p,  one  searches  for,  and  finds,  a  vertex  of  Q 
after  at  most  n-p  additional  iterations.  This  is  done  by  updating 

t  ^  D 

K  (using  the  same  formula)  until  K  a  =0  for  all  i  c  I(xK),  and 
P  P 

by  then  doing  a  horizontal  line  search  in  the  direction  on  -K^a-t 

iv  r\ 

(where  j  e  M,  J  t  I(xK));  the  line  search  is  called  horizontal 


because  it  is  done  in  the  null  space  of 
remains  constant. 


A  ,  and  thus  the  function 
P 


f 


4.6 


If  the  matrix  Kq  is  an  identity  then  the  termination  routine  is 
a  projection  method  [4,5,7,12,27,31,39,40],  while,  if  Kq  =  H  =  TTt, 

it  is  a  projection  method  within  the  space  of  the  variable  y  (x  = 

Ty),  which  uses  the  function  f(y,T).  It  seems  reasonable  to  expect 
that  using  Kq  =  H,  where  H  is  the  matrix  used  in  algorithm  2,  or  H 
is  a  matrix  generated  by  the  ellipsoid  method,  may  do  some  good. 

It  should  be  pointed  out  that  the  termination  routine,  while  it 

/v 

decreases  f  (or  f,  if  one  used  f  to  guide  the  descent)  does  not 
necessarily  decrease  g;  in  fact,  except  in  special  cases  (see  Todd 
[33]),  the  direction  Zp  is  not  a  subgradient  of  f  at  x^. 

This  means  that  it  is  not  possible  to  guarantee  convergence  if 
one  tried  to  switch  too  frequently  between  the  relaxation  method  and 
the  termination  routine  (if  it  fails).  By  not  too  frequently,  we  mean 
that  the  convergence  theory  of  Theorem  2.2  indicates  that  if  q  is 
large  enough  so  that  0q  £  p*(P)/e,  with  e  >  1,  then  it  is 
guaranteed  that,  after  q  steps  of  the  maximal  distance  relaxation 
method,  one  has  f(xq)  £  g(xq)  £  e-V*(P)  g(xq),  and  thus,  after  n 
additional  steps  of  the  termination  routine  (applied  to  f(x)),  one 
has 

g(xq+n)  £  f (xq+n)/p*(P)  £  f(xq)/p*(P)  £  e"1  g(x^) 

The  rate  of  convergence  is  badly  affected  (in  theory),  unless  e  is 
taken  to  be  of  the  order  of  (p*(P))-'*,  in  which  case  it  is  reduced 
by  about  half. 


Another  possibility,  after  the  termination  routine  finds  a  vertex 

of  Q,  would  be  to  switch  to  the  simplex  method;  note  that  if  one 
t  t  -1 

updated  K  A  (A  K  A  )  rather  than  K  ,  then  at  step  n-1  one 
oppop  p  r 

would  have  computed  An^  ,  anc*  thus  the  information  needed  for  the 
simplex  method  would  be  available. 

An  interesting  feature  of  the  termination  routine  is  that  it  is 
possible  to  implement  it  when  the  function  f  and  its  subgradients 
are  given  by  an  oracle,  or  by  the  solution  of  subproblems  (as  is  done 
when  one  uses  Dantzig-Wolfe  decomposition,  or  other  similar  schemes). 
Under  such  a  description  of  the  function  f  one  usually  assumes  that 
the  sets  H  and  I(x)  cannot  be  listed  (or  even,  reasonably 
computed);  but  the  condition  (in  step  2)  (a*,z)  1  for 

all  i  c  I(x)  can  be  tested,  and  an  aJ,  such  that  j  e  I(x)  and 
(a3,z)  <  1,  can  be  generated,  by  taking  a  small  ("null")  step  in  the 
direction  z.  By  a  null  step  is  meant  any  step  6  such  that  if 
j  e  I(x-6z),  then  f(x)  -  f(x-6z)  =  6(a^,z);  then  clearly  j  e  I(x), 
and  (aj,z)  <  1.  It  is  possible  to  implement,  at  the  level  of  the 
master  problem,  a  line  search  which  will  find  such  a  6,  but  it  may 
take  a  large  though  finite  (i.e.,  exponential  in  the  size  of  the 
problem)  number  of  steps;  but  usually  the  line  search  can  be  performed 
polynomially  (i.e.,  in  the  size  of  the  problem)  at  the  subproblem 
level  (within  the  oracle). 


5.  Integrality 

The  assumption  that  the  data  is  integer  is  needed  for  two 
reasons: 

1.  in  order  to  give  computable  bounds  on  wj  and  p*(P),  so  as 
to  guarantee  that  the  switch  to  the  termination  routine  is 
not  done  prematurely; 

2.  to  guarantee  that  all  numbers  needed  in  the  computation  can 
be  represented  in  a  space  which  is  polynomial  in  the  length 
of  the  input  (L). 

The  sequence  {xd>  generated  by  algorithm  1  is  not  integer, 
unless  (see  Eaves  [9])  each  a*-  has  components  0,  1  and  -1,  at 
most  two  of  them  being  nonzero,  and  one  uses  a  relaxation  parameter 
equal  to  2. 

In  what  follows,  we  will  use 

a  =  Maxda^l  :  i  e  M}  and  b  =  Kb B  , 

CD  CD  7 

rather  than  L,  the  length  of  the  input. 

Lemma  5.1 

If  Algorithm  1  is  polynomial  time,  then  all  numbers  in  the 
computation  may  be  represented  as  a  ratio  of  integers,  which  are 
polynomial  space.  The  same  holds  for  Algorithms  1'  and  2  if  T  and 
H  are  integer  matrices,  and  polynomial  space. 


5.1 


Proof: 


For  Algorithm  1,  let  eq  =  I^'q  la1^!2,  then  xqeq  is  an 
integer  vector  (assuming  x°  =  0,  or  integer).  Let  6^  =  Ix^e^l^, 
then  xq  is  a  ratio  of  an  integer  vector  xqeq  of  size  at  most  6^, 
and  an  integer  scalar  e^. 

Now 

xq+1  c  .  =  (xqe  )  la^l2  +  ((aiq,xq)  -  b^Je  a1**  , 
q+l  q  q 


hence 


6  .  <  2n  6  a  +  t  ba  , 

q+1  ~  q  q 


2  2 

(using  lal  nlal^,  and 


(a.x)  lal  <  lal  Ixl  lal  <  nlxl  lal  )  ; 

1  CD  — •  00  —  ®  W 


also  e  <  (na2)q. 

q  — 

Proceeding  by  induction  on  q,  one  is  led  to  (assuming  6  =  0) : 


6  <  (2q-1)  (na2 )q1  ba  <  (2ni2)q  b  ; 

q  -  — 


and  thus 


log2  6  £  q  log2  2na  +  log2  b  , 


log 2  eq  <_  q  log2  n  a  , 

and  the  first  part  of  the  lemma  is  satisfied  for  the  relaxation  phase 
of  Algorithm  1. 


5.2 


The  termination  routine  lends  itself  to  the  same  type  of 
analysis,  provided  Kq  is  integer  and  polynomial  space,  and  one  does 
not  use  f(x),  or  f(y,T)  or  $(x,T),  which  introduce  irrational 
numbers,  but  f,  f,  <>,  etc.  The  representation  of  Kp  and  Zp 
make  it  clear  that  both  may  be  represented  by  ratios  of  integers  which 
are  polynomial  space  (using  the  same  reasoning  as  above).  The  line 
search  can  also  be  performed  using  integer  arithmetic,  and  then  6 
and  xP  may  also  be  represented  as  ratios  of  polynomial  space 
integers. 

For  Algorithm  V,  the  results  extend  if  one  inteprets  a  as  the 
largest  entry,  in  absolute  value,  of  AT;  for  Algorithm  2  the 
relationship  x  =  Ty  shows  that  the  lemma  is  also  correct.  QED 

The  function  t'(x)  =  MaxieM(  ( (a1  ,x)  -b^)/  la*! )  may  be  used  in  the 
relaxation  phase  of  the  algorithm  as  the  set  I(x)  may  be  computed  by 
using  integer  arithmetic: 

I(x)  =  (1  t  H:  ((ai,x)-b1)2/lail2  =  f2(x),  (ai,x)-bi  >  0}  . 

The  estimates  5.2 

1. 

w  >  2~(n+1 )/2  Min|  n  j  c  M>  | x I  =  n+t} 

Id 

>  (2ni2r("+1)/2 


5.3 


Proof: 


Every  nonzero  vertex  level  w  solves 
aUx  -  b1  =  la1!*  I  c  M,  |l|  =  n+1  , 

hence 


using  the  fact  that  the  numerator  is  nonzero  and  integer,  and 
Hadamard's  inequality  for  the  denumerator,  one  gets 

w  >  1/  n  (la1!2  +  la1!2)^2  .  QED 

lei 

The  same  proof,  using  an  orthogonal  transformation,  shows  that, 
if  Assumption  3.1  does  not  hold,  the  same  result  holds  with  rank  A 
replacing  n.  Also  the  same  estimate  is  valid  for  w_i,  the  first 
negative  vertex  level. 


2. 


w.  >  Mln{  n  (1  +  laA«2)"1/2:  I  c  M,  M  =  n+l} 

id 

S  t  1  nl2d<n+1)/2 

)  \ i  +  03  )  • 


5.4 


3. 


A  s  ,,  -2v-(n+1)/2 

(2na  ) 


4.  If  A  is  totally  unimodular ,  then 


"i  > 


(n+1  )n 
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*1  > 


(n+1  )n 
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»  > 


1  -  n+1 


(this  is  done  by  expanding  the  bottom  determinant  with  respect  to  its 
last  column,  and  noticing  that  all  n+1  minors  are  -1,0  or  1). 

If  one  had  written  a  totally  unimodular  linear  program  as  a 
system  of  linear  inequalities  (by  using  primal  and  dual  constraints, 
and  a  reverse  weak  duality  constraint),  the  resulting  system  is. not 
totally  unimodular  (because  of  the  weak  duality  row),  but  estimates  of 
w*  are  still  polynomial  in  n,  and  in  the  size  of  the  numbers 
Involved  in  the  right  hand  side  and  in  the  objective. 


5.  wi(T)  which  is  identical  for  f(y,T)  and  *(x,T): 
W!(T)  >  w1det1/2H/(A(H))(n+1)/2. 


Proof: 

Solving 


w  = 


ait:  Ty  -  b^  =  wlTtail,  i  e  I  c  M,  jlj  =  n+1,  one  gets 
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>  (det  T )/  IT  (2lTtail2)1/2 
i  €  I 

(using  the  fact  that  det(AT,b)  =  det  T  •  det(A,b)) 

>_  (det  T)/( (2  A(H))(n+1)/2  n  la1!)  .  QED 

id 


6.  n*(P)  -  v(Pw)  (where  w  =  0  if  P  is  full  dimensional, 

while  if  not  w  c  (OjWj]): 

H*(P)  =  v(P  )  >_  n"1/2  e"1/2  Min{  IT  la1!"1:  I  c  M,  III  <  nl 
w  i  £  I 

v  -1 /2  -(n+1)/2  --n 

^  c  n  3  • 


Proof: 


Using  the  last  definition  of  v  (definition  3.4), 


v  =  Min{(XtrX)1/2/IXI7:  X^  0} 


i,._i, 


for  some  nonsingular  Grammian  build  on  the  vectors  a  /la  I  belonging 
to  an  index  set  I,  |l|  n;  hence 


v  >  Min{(Xtrx)1/2/(n1/2IXl)} 
>  n'1/2  X1/2(r)  . 


(where  this  last  X  means  the  smallest  eigenvalue). 
Now  (see  [16]) 


X(D  >  (det  D/(Tr  r/(n-1)) 


n-1 
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and  Tr  r  =  1 1 1  _<  n ,  while 


det  T  =  det(ATA*)/  n  la1!2  , 

11  1«  I 


it 


(where  Aj  is  a  matrix  whose  rows  are  a  ,  i  c  I;  it 
integer);  thus 


Hence 


det  r  >  n  la1!"2 
id 


\(T)  >  n  lail"2/(n/(n-1))n"1 
Id 


>  e-1  n  la1!-2 
i  e  I 


and  thus 


-1/2  -1/2 


v  >  e  ” "  n  Min  {  n  la1!-1:  I  c  M,  III  <  n 

id  '  1  “ 


>  e-1/2  l)~n  . 


For  totally  unimodular  problems,  la*l  <  n^2,  and 


becomes 


p*(P)  =  v(P  )  >  e'1/2  n'(n+1)/2 
w 


7.  The  compactif ication  scheme  constant  c(w),  where 


c(w)  >  n(n^2  a)n_1  (6  +  n1^2  iw)  . 


is  also 


} 

QED 


the  estimate 


w  *  [0,wi], 


Proof; 

The  constant  c(w)  was  defined  so  that  P  n  {x:  Ixl  <  c(w)} 

w  00  — 

contains  at  least  one  point  in  the  relative  interior  of  every  face  of 
As  every  face  of  Pw  contains  at  least  one  vertex  (under 
Assumption  3.1),  one  may  take  any  c(w)  >  Maxtlxl^:  x  is  a  vertex  of 
Pw) . 

Every  vertex  of  Pw  is  the  solution  of 

(a*,x)  -  b*  =  la*lw  ,  i  «  I  c  M,  |l|  =  n  , 

where  w  is  a  constant.  Using  Cramer's  rule,  one  gets  (where  x^ 
is  the  kth  coordinate  of  x) 


and  thus 

c(w)  >  n(n1/2  a)n_1  (B  +  n1/2  5w)  .  QED 

If  A  is  totally  unimodular,  then 

c(w)  >  n(b  +  n^2  w)  . 
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8.  f (0)  <_  6. 

All  constants  computed  are  exponential  In  the  length  of  the 
input,  and  thus  their  logarithms  are  polynomial  (and  hence  <-'.ey  are 
representable  in  polynomial  space);  the  constants  wl ,  c(w)  are 
polynomial  in  n,  and  b,  if  one  assumes  total  unimodularity  of  the 
matrix  A. 

But,  the  constant  that  really  matters  (n*(P)),  as  it  determines 
the  rate  of  convergence  (or  the  polynomality)  of  the  algorithm,  is 
clearly  exponential,  and  it  seems  hard,  though  not  impossible,  to  find 
better  estimates  except  on  very  special  classes  of  problems  (total 
unimodularity  is  not  special  enough). 

It  should  be  noted  that  m,  the  number  of  constraints,  does  not 
appear  anywhere. 

Theorem  5.2 

Let  Ax  b,  where  A  and  b  are  integers,  be  a  system  of 
linear  inequalities,  then  the  maximal  distance  relaxation  method 
(Algorithm  1)  will  decide  that  the  system  is  infeasible  or  find  an 
exact  solution  to  it  in  polynomial  time  (and  polynomial  space)  if 
(n*(P))-^  is  a  polynomial  of  n,  the  dimension  of  the  space. 

The  variable  metric,  maximal  ellipsoidal  distance,  relaxation 
method  (Algorithm  2)  with  ellipsoid  matrix  H  =  TT^,  will  do  the 
same  if  p*(T_)p)  is  a  polynomial  of  n,  if  log(A(H)/\(H) )  is 
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polynomial  in  the  length  of  the  input,  and,  for  polynomial  space,  if 
T  (or  H)  are  matrices  of  polynomial  space  integers. 

Proof: 

Using  the  fact  that  Jlri(1-x2)-V2  >  x2/2,  the  convergence 
Theorem  2.2  and  the  analysis  of  the  termination  routine  4.1,  it  is 
clear  that  Algorithm  1  will  take  at  most 

n  +  (2/(p*(P))2)  in(f(0)/(w1  p*( P)))  . 

iterations  to  solve  the  problem.  Because  of  the  estimates  5.2,  the 
logarithm  is  polynomial  in  the  length  of  the  input,  and  the  algorithm 
is  polynomial  time  if  (p*(P))-^  _<  n^  for  some  nonnegative  k. 

The  issue  of  polynomial  space  is  easily  settled  by  using  Lemma 
5.1,  and,  for  instance,  by  using  the  function  f. 

For  Algorithm  2,  Theorem  2.4  indicates  that  the  number  of 
iterations  will  not  exceed 

n  +  (2/(p*(T_1P))2)  An(f(0)  A1/2(H)/(w1  p*(T_1P)  \1/2  (H)))  ; 
and  hence  the  theorem  follows  similarly.  QED 

This  theorem  shows  that  except  in  very  special  cases,  Algorithm  1 
is  exponential,  and  this  may  mean  exceedingly  bad,  in  practice.  It 
works  well  on  some  classes  of  problems,  if  p*(P)  is  not  too  small; 
it  has  been  conjectured  that  (p*(P))-1  is  a  polynomial  of  n  for 
assignment  problems,  but  this  does  not  seem  to  extend  to  general 
totally  unimodular  problems. 


Algorithm  2  shows  a  potential  for  improving  over  Algorithm  1;  in 
fact,  we  will  show  in  Sections  6  and  7  that  there  always  exists  a 
linear  transformation  such  that  p*(T"^P)  >_  1/n,  and,  in  a  sequel  to 
this  paper,  that  the  ellipsoid  method  [2,3,6,11,16,17,18,24,25,32,36, 
37]  is,  in  fact,  an  algorithmic  procedure  (polynomial)  to  identify 
such  at  T. 
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6.  Ellipsoids 

A  well  known  result  due  to  3ohn  [23]  says  that  the  set  of  linear 
transformations  (P)  =  {T:  a(T*1p)  ^  n,  T  nonsingular}  is  not 
empty,  if  P  is  full  dimensional,  compact  and  convex  (P  need  not  be 
a  polyhedron).  We  will  denote  by 

E*  =  e*  +  T*s  (resp.  E*  =  e*  ♦  T*S) 

the  largest  volume  (resp.  smallest  volume)  ellipsoid  contained  in 
(resp.  containing)  P.  Both  ellipsoids  are  unique;  see  Oohn  [23], 
Dantzer-Laugwitz-Lenz  [8],  Zaguskin  [38]  and  Grunbaum  [19]. 

The  fact  that  T*  £  «(P)  has  been  proved  by  3ohn  [23],  while 
T*  e  e(p)  is  alluded  to  by  Grunbaum  [19,  p.  241]. 

Both  these  results  may  be  interpreted  in  terms  of  the  affine 
excentricity  (or  affine  asphericity,  or  aellipsoidality)  of  a  convex 
set  P. 

Definition  6.1 

The  affine  excentricities,  where  P  is  compact,  convex  and  full 
dimensional, 

t(P)  =  Inf{a(T’V):  T  is  nonsingular} 

=  Inf {t  >  0:  x  +  E  c  P  c  x  +  tE} 

where  E  stands  for  any  ellipsoid  centered  at  the  origin;  note  that 
the  second  definition  makes  sense  even  if 


6.1 


1  <  dim  P  <  n-1  . 

x"(P)  =  Inf(x  >  0:  P  c  e#  +  xT#S)  , 

this  definition  is  sometimes  easier  to  use  or  to  characterize;  one  has 
x"(P)  =  a'd^P). 

Both  affine  excentricities  are  invariant  under  affine 
transformations,  while  x(P)  is  the  least  asphericity  of  any  member 
of  the  class  of  affine  transforms  of  P;  also  x(P)  x"(P). 

Thus,  it  is  true  for  any  compact,  convex  and  full  dimensional  set 
P,  that  1  -t(P)  _<  x"(P)  n,  and  furthermore  x(P)  =  1,  or  x"(P) 

=  1,  if  and  only  if  P  is  an  ellipsoid,  while  x(P)  =  x"(P)  =  n  if 
and  only  if  P  is  a  simplex.  If  P  is  centrally  symmetric  then 
1  <  x(P)  <  x"(P)  _<  /n,  and  if  x(P)  =  /n,  or  x"(P)  =  /n,  and  P 
centrally  symmetric  if  and  only  if  P  is  a  parallelotope. 

One  somewhat  interesting  fact  to  notice  is  that  the  function 
4>(x,T*)  always  has  a  unique  minimum  at  x  =  e*,  and  4>(e*, 

T*)  =  -1  (under  the  assumption  that  P  is  bounded  and  full 
dimensional);  this  follows  from  the  definition  of  0,  and  the  unicity 
of  E*. 

The  ellipsoid  principle  (which  we  shall  call  primal),  with 
shallow  cuts,  as  given  in  Yudin  and  Nemirovskii  [36,37],  and  Todd 
[34],  and  a  dual  ellipsoid  principle,  can  be  used  to  prove  the  fact 
that  (P)  is  not  empty,  in  a  way  which  is  somewhat  constructive;  by 
this,  we  mean  that  the  ellllpsoid  method  (primal)  and  a  dual  ellipsoid 


msm, 


method  actually  construct  approximations  to  T*  and  T#  (this 
statement  will  be  proved  In  a  sequel  to  this  paper). 


The  primal  ellipsoid  principle  6.2  [34,36,37] 

Let  E  be  the  ellipsoid 

(x  €  Rn:  (x-x*)*"  H  ^(x-x*)  <  1)  , 

and  V  =  (x  €  Rn:  (a,x)  £  b)  be  a  halfspace,  and  define 

u)  =  ((a,x*)-b)/(atHa)1^2  , 


then  the  smallest  volume  ellipsoid  containing  E  n  V  is  denoted  by 
E+,  and: 

1.  if  u  <  -1/n,  E+  =  E; 

2.  if  u>  >  -1/n,  Vol  E+  <  Vol  E. 

The  formula  giving  E+  is  well  known;  it  is  also  easily 
computable. 

If  one  used  two  halfspaces,  containing  x*,  and  symmetric  with 
respect  to  x#,  then  n  may  be  replaced  by  Jn  (this  is  called  the 
symmetric  range  ellipsoid  principle,  see  Todd  [34]). 

The  dual  ellipsoid  principle  6.3 

Let  E^  =  {  x  e  R°:  (x-x*)*  G(x-x#)  £  1)  be  an  ellipsoid,  x#+v 
a  point  in  Rn,  and 

o)'  =  Sup{to"  >_  0:  x#+u>"v  e  E^} 
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then  if  one  defines  to  be  the  largest  volume  ellipsoid 


contained 

in 

*<Ed  u 

{x#+v}),  one  has 

1. 

if 

to' 

1  ^  1/n 

•  Ed+  =  Ed« 

2. 

if 

to1 

1  <  1/n 

,  Vol  Ed+  > 

Vol  Ed, 

and 

Ed+ 

= 

(X 

Dn 

e  R  : 

(x-x*+)t  G+ 

(x-X-  )  < 

T  - 

where 


;  -  1-  E  pi 

Gw  tG  ~ 

+  ■  o'  £  1  +  B' 

vfcGv  - 

9 

*+  =  x*  +  fi'fa')”1 

v  , 

(n-1 )  ( 1  +to ’ ) 

B' 

*2 

1-to  1 

"  (n+1)(1-to')  » 

=  2,7‘ 

n  -1  to' 

_  to'  -1  n-1 

“  to'  (n+1)  n+1  ’ 


Vol2  E  .  = 

d+ 


(n-1) 

(n+1) 


n-1 

n+1 


<W)n*' 


No  proof  shall  be  given  here.  It  is  interesting  to  note  that  if 
one  took  the  duals  (or  polars,  see  [30,  p.  125])  of  the  sets  involved 
in  the  primal  ellipsoid  principle  (where  to  <  0,  and  to'  =  -to),  with 
respect  to  the  center  of  E,  then  one  can  prove  a  result  similar  to 
the  dual  ellipsoid  principle;  the  only  difference  being  that  weaker 
values  of  a',  B'  and  6'  ensue.  The  proof  of  this  involves  a  bit 
of  work,  but  is  straightforward. 


One  could  say  that  the  dual  of  the  primal  ellipsoid  principle  is 
a  weaker  form  of  the  dual  ellipsoid  principle.  On  the  other  hand,  the 
dual  of  the  symmetric  range  primal  ellipsoid  principle  gives  exactly 
the  equivalent  dual  principle. 

Theorem  6.4 

Let  (P)  =  {T:  a(T-1p)  <  n,  T  nonsingular}  where  P  is  a 
full  dimensional,  compact,  convex  set,  then  «(P)  is  not  empty.  If 

‘M'  #  # 

=  e*  +  T*S  and  E  =  e  +  T  S  are  respectively  the  largest 
ellipsoid  inscribed  in  P,  and  the  smallest  ellipsoid  circumscribed 
around  P,  then  both  T#  and  T*  belong  to  «(P).  If  P  is 
centrally  symmetric,  then  this  remains  valid  with  /n  replacing  n. 

Proof: 

We  will  show  that  e#  +  nT#S  ^  P.  If  this  is  not  true,  then 
there  exists  a  point  v  c  Rn  such  that  v+e#  e  P,  and  v+e#  t  e# 

+  nT#S;  hence 


SupW  0:  u>"v  c  T#S}  =  w'  <  1/n 

Thus  the  dual  ellipsoid  principle  6.3  shows  that  there  exists, 
and  in  fact  constructs,  an  ellipsoid  E+  such  that  Vol  E+  >  Vol  E# 
and 


E+  c  A(E  u  {  e#+v})  e  p  , 

a  contradiction. 
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A  proof,  dual  to  this  one,  using  shallow  cuts  in  the  primal 
ellipsoid  principle  shows  that  T*  e  e  (P)  (this  is  John's  theorem 
[23]). 

The  case  of  centrally  symmetric  sets  follows,  similarly,  from  the 
symmetric  range  ellipsoid  principles.  QED 

As  the  ellipsoids  E*  and  E*  play  a  crucial  role  in  the 
proofs  of  Section  7,  we  shall  characterize  E*  and  E*  by  the 
use  of  the  Kuhn-Tucker  or  Fritz  3ohn  conditions.  For  E*,  it  is 
necessary  to  assume  that  P  is  given  by  a  system  of  linear 
inequalities,  while  for  E*  it  is  necessary  to  assume  that  P  is 
given  by  the  convex  hull  of  a  set  of  points.  The  study  of  E*  is 
given  in  3ohn  [23],  and  is  probably  the  first  application  of  the 
necessary  optimality  conditions  of  mathematical  programming. 

Both  of  these  are  optimization  problems,  with  unknowns  H  (or 
G  =  H-T),  a  positive  definite  symmetric  matrix,  and  x,  the  center 
of  the  ellipsoid 

E  =  (y  e  Rn:  (y-x)t  H_1(y-x)  <  1}  . 

One  could  regard  H  as  a  point  in  Rn>n,  subject  to  symmetry 
constraints,  but  is  is  more  natural  to  view  H  as  an  element  of 
p(Rn),  the  cone  of  positive  semidefinite  symmetric  matrices,  which 
we  will  assume  to  be  defined  within  s(Rn),  the  (linear)  space  of 
symmetric  matrices.  The  interior  of  p(Rn)  is  p^(Rn),  the  cone  of 
positive  definite  symmetric  matrices,  while  the  extreme  rays  of 
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p(Rn)  are  the  symmetric  positive  semidefinite  rank  one  matrices 

(l.e.,  of  the  form  aat,  with  a  c  Rn). 

If  H,  K  c  s(Rn),  the  scalar  product  is  (H,K)  =  Tr  HK,  where 

1/2  2 

Tr  means  trace,  and  this  induces  the  Froebinius  norm  Tr  H  .  It 
is  known  that  p(Rn)  is  a  self-dual  convex  cone  (l.e.,  Tr  HK  ^  0 
for  all  K  c  p(Rn)=>H  £  p(Rn)). 

The  equality  Tr  W^K  =  Tr  V*KV,  valid  if  K  is  symmetric, 
and  where  V  may  be  rectangular,  will  be  used. 

The  volume  of  E  is  given  by  (det^/2  h)  Vol  S,  and  hence  one 
shall  maximize  det  H,  or  in  det  H,  and  thus  an  expression  of 


d  Xn  det  H 
dH  ’ 


for  H  £  p^(Rn),  is  needed 


It  is  known  that  Xn  det  H  is  a  concave  function  of  H  (see  Fan 

[10]). 


Lemma  6.5 

•  Let  H  c  p®(Rn),  then  (d  Xn  det  H)/dH  =  H'\  and  Xn  det  H  is 
strictly  concave  on  pO(Rn). 


Proof: 


We  shall  compute,  for  e  small  enough, 


Xn  det(H+eK)  -  Xn  det  H 


(H  e  p°(Rn),  K  e  s(Rn))  , 


=  Xn  det  H1/2(I+eH'1/2  KH'1/2)  H1/2  -  Xn  det  H 
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=  in  det(I+eH'1/2  KH_1/2) 
n 

=  in  I!  (1+eX.  )  ,  (X  are  the  eigenvalues  of 
i-1  1  1 

H  ^2  KH  ^2  or  H  V,  all  real  numbers) 


f  in(1+eX.) 
i=1  1 


n  4  4 

l  1  (-dj_1  -n1 

i=1  j=1 


( | eXj |  <  1) 


=  l  (-1 ) J_1  Tr(H"1/2  KH_1/V  . 

J=  1 

Note  that  Tr(H*1/2  KH"1/2)<f  =  TrWV)*,  and  thus 

11m  [in  det(H+eK)  -  in  det  H]  e =  Tr  H~V  , 

e  -*•  0 

and  thus 

d  in  det  H  ,,-1 
dH  =  H  . 

For  the  strict  concavity,  note  that 

Tr(H'1/2  KH-1/2)2  >  0  ,  for  all  K  *  0  , 


and  this  means  that  the  second  derivative  is  negative  definite.  QED 


It  may  be  checked  that  for  Jin  det  H,  the  steepest  ascent 


direction  is  H--*,  while  the  Newton  direction  is  H. 

Theorem  6.6 

Let  P  =  {x  £  Rn:  (a\x)  £  b^,  i  e  M}  be  a  compact,  full 
dimensional,  polyhedron;  then  a  necessary  and  sufficient  condition  for 

E*  =  e*  +  T*s  =  £  Rn:  (x-e*)t  H ^(x-e,)  <  1}  , 

where  H*  =  T*T^,  to  be  the  largest  ellipsoid  inscribed  in  P,  is  that 
there  exist  nonnegative  multipliers  X^,  i  £  M,  such  that: 

1  r  .  i  it  i  \  \ 

H*  =  2,  X.  a  a  ,  (e  p  (R  ))  , 

i£M  1 

l  X.(b.-(al,e#))ai  =  0 
icM  1  1 

X1[(aitH*ai))1/2  -  (bj-taSe,))]  =  0  ,  for  all  i  £  M 
(aitH*ai)1/2  <  b^Ca^e,)  ,  for  all  i  £  M  . 


Proof: 

The  largest  ellipsoid  inscribed  in  P  is  given  by  the  solution  of 
the  following  optimization  problem: 

Max {det  H1/2:  x  +  H1/2S  c  p,  H  £  p°(Rn)>  . 


But  x  +  HV2s  c  p  if  and  only  if 


x  +  H^2S  c  {y  e  Rn:  (a^,y)  <_  b^}  ,  for  all  i  e  M 
or 

(aitHai)1/2  <  bL  -  (a\x)  ,  for  all  i  e  M 

and 

b^  -  (a*,x)  >  0  ,  for  all  i  c  M  . 

Thus  the  optimization  problem  becomes: 

Max  in  det  H 

subject  to  (a^a^,  H)  £  (b^-(a*,x))2  ,  for  all  i  e  M 
with  also 

(a*,x)  -  b^  <  0  ,  for  all  i  e  M 

H  e  p°{Rn)  . 

It  Is  clear  that  the  constraints  (a*,x)  <  b  ,^i  e  M  and 
H  e  pO(Rn)  have  no  impact  on  the  optimality  conditions.  The 
objective  is  strictly  concave  in  H  and  its  gradient  is  H“1. 
The  constraints 

(a1ait,  H)  -  (bA  -  (a\x))2  <  0  ,  for  all  i  e  M 
are  such  that: 
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(a^a^t,H)  is  linear  in  H,  and  its  gradient  is  a^a^ 
it  2 

-  (b^-a  x)  is  concave  in  x,  but  also 

quasiconvex  on  {ye  Rn:  (a^,y)  £  b^}  ; 

they  also  satisfy  the  (extended)  Slater  constraint  qualification. 

Thus  the  Kuhn-Tucker  conditions,  that  is,  the  theorem,  are 
necessary  and  sufficient.  QED 

The  condition  Xj,a*a^  (which  is  also  the  optimality 

condition  for  the  largest  ellipsoid  with  a  given  center)  means  that 
G*  =  *s  a  positive  linear  combination  of  symmetric  rank  one 

matrices,  which  use  the  normals  to  the  facets  of  P: 

-1  i  it 

G#  =  e  Convex  cone  hull {a  a  :  (i)  e  (P) >  . 

-1  - 1 

In  the  primal  ellipsoid  method  [16],  one  always  has  aH+  =  H  + 
yaat,  where  a  c  (a*:  i  e  M)  and  y  >  0;  and  thus  the  primal 
ellipsoid  method  may  be  described  as  updating  one  of  the  multipliers 
Xi,  at  every  iteration. 

The  simplex  method  may  be  described  by  a  matrix  H  ^  a^a**" 

=  A  A,  where  |N|  =  n  and  A  is  square  and  full  rank  (see  Gill  and 

Murray  [12]),  and  updates  are  performed  by  adding  to  a  new 

a^a^,  ]  i  N,  and  subtracting  an  old  one  a^a*^,  i  e  N;  expressed  in 
1 

terms  of  H  or  A  this  is  the  pivoting  operation.  So,  in  a  sense, 
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the  simplex  method  always  keeps  all  multipliers,  but  n,  equal  to 
zero,  while  the  n  nonzero  multipliers  are  kept  equal  to  one;  and  a 
pivoting  is  simply  an  exchange  of  two  X^. 

The  optimality  conditions  may  be  expressed  in  many  different 
ways;  one  interesting  option,  as  it  permits  the  interpretation  of  the 
multipliers  in  terms  of  sensitivity  analysis  is 

Max{y  In  det  H:  (altHai)1/2  +  axtx  <  bx  ,  for  all  i  e  M) , 


and  it  gives 


I  I* 

icM 


i  it 
a  a 


1  (a\aW  ’ 


l  =  0  , 

icM 


(note  that  X.  =  p^/(aXtH#a*)^2  =  p^/(b^-a^te  )):  then  (under 
standard  technical  assumptions)  a  perturbation  of  b^  to  bj+e 
(e  small)  implies  that  the  maximum  volume  is  multiplied,  approximate 
ly,  by  ee^i. 

The  minimum  volume  ellipsoid  containing  a  polyhedron  P  =  (v^: 

i  e  1}  is 

E  =  e  +T  S  =  (x  e  R  :  (x-e  j  (H  )  1  (x-e  )  <  1}  , 

*  *  *t  *  -1 

where  H  =  T  T  =  (G  )  ,  and  is  characterized  by 

H  =  I  v.(vi-e*)  (vi-e*)t 
id  1 

0=1  v  (v-e*),  v,  >  0,  i  e  I  , 

id  1 
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(v*-e  )*■  (H- )  ^  (v^-e  )  £  1  with  equality  if  >0,  (i  e  I)  . 

Note  that  H*  is  given  by  a  positive  linear  combination  of 
symmetric  rank-one  matrices,  constructed  on  the  set  of  vertices  of  P. 

In  both  cases  (E*  and  E*),  the  complementarity  conditions 
are  hard  to  use,  and  it  is  possible  to  give  explicit  expressions  for 
E*  and  E*  only  in  a  few  particular  cases. 

If  P  is  an  ellipsoid,  where  P  =  {x  e  Rn:  x^Xx  £  w},  X 

-1  *  _  i 

positive  definite  symmetric,  then  =  (H  )  =  X/w,  and  thus  the 

ellipsoid  matrix  (the  "variable"  metric)  is  the  inverse  Hessian. 

If  P  =  (x  e  Rn:  |(a*x)  -  b^ |  £  w,  i  =  1,  ...,  n}  where 
A  =  (a\  ...,  a0)*"  is  square  and  nonsingular,  while  w  is  positive, 
then  P  is  a  parallelotope,  and  if  one  lets  x  =  A_1(y+b)  then  P 
transforms  into  a  cube,  P'  =  (y  c  Rn :  lyl^  £  w};  the  largest  ellipsoid 
in  P'  is  clearly  the  sphere  (y  e  Rn :  yfcy  £  w2},  which  transforms 
back  into  the  maximum  volume  ellipsoid  in  P: 

E#  =  (x  e  Rn:  (x-x  )t  AfcA(x-x*)  £  w2}  , 

where  x  =e#=e  =A^b,  H#=  w2A  ^A  and  H  =  nH#.  If  one  did 

write  the  problem  of  finding  the  solution  of  Ax  =  b  as  a  quadratic 
programming  program,  i.e.,  MinlAx-bll^,  and  if  one  defines  the  level 
set 


P'  =  (x  c  Rn:  »Ax-bl2  <  w2}  , 


then  the  larqest  ellipsoid  inscribed  in  P"  is  exactly  the  same  as 
the  one  inscribed  in  P  =(x  e  Rn:  HAx-bll®  <  w}. 

* 

The  last  example  of  a  set  P  for  which  E*  (and  E  )  can  be 

given  somewhat  explicitly  is  that  of  a  simplex  P  ^  (x  t  R°:  Ax  <  b), 

where  A  c  Rn+^’n  has  rank  n;  P  has  an  interior  if  and  only  if 

tc^A  =  0,  n^b  =  1 ,  n  >  0  has  a  solution  (n  is  unique).  The  linear 

*  -1 

transformation  T#  (and  also  T  )  has  the  property  that  T#  P  is  a 
regular  simplex;  this  follows  because  the  largest  ellipsoid  inscribed 
in  a  regular  simplex  is  a  sphere,  and  T“^E#  is  the  larqest  ellipsoid 
inscribed  in  T-1p,  for  any  T. 

Also,  this  shows  that  e*  is  the  center  of  gravity,  or  the 
centroid,  of  P. 

After  somewhat  lengthy  calculations,  one  gets 
H;1  =  n(n+1)  A1  Diag(n*)A  , 

where  it  was  defined  earlier,  and  Diag(^)  is  a  (n+1)  x  (n+1) 

2 

diagonal  matrix,  whose  diagonal  elements  are  n.;  also  e.  solves  the 
system 


Ae#  +  Diag(1/rc^)  w  =  b  , 

where  w  is  a  variable  (but  its  value  is  w  =  1/(n+1)).  If  P  were 
defined  by  its  vertices,  then  let  V  e  Rn»n+,l  be  a  matrix  whose 
columns  are  the  vertices  of  P,  then 

e*  =  — l-r  VI,, 

*  n+1  n+1  ' 
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H.  =  KWTT  V[I-,1  -  n+T  Vi  'n.11  vt  • 

where  1n+i  as  a  n+1  dimensional  vector  of  ones,  and  In+-j  is 
a  n+1  dimensional  unit  matrix. 

All  of  this  suggests  rather  compellingly  that  the  matrix  H* 
(or  H*)  plays  the  role  of  the  inverse  Hessian,  in  classical 
optimization. 

The  matrix  TT1  which  satisfies  x(P)  =  o(T"^P)  might  be  a 
more  satisfactory  definition  of  the  "inverse  Hessian",  but  the 
Kuhn-Tucker  conditions,  which  characterize  it,  do  not  seem  terribly 
insightful . 


6.15 


ifl 


7.  The  potential  for  polynomiality  of  the  variable  metric,  maximal 
ellipsoidal  distance,  relaxation  method 


In  this  section,  it  will  be  shown  that  for  every  system  of  linear 
inequalities  (assumed  consistent),  there  exists  an  ellipsoid  such  that 
the  variable  metric,  maximal  ellipsoidal  distance  relaxation  method 
(Algorithm  2)  is  polynomial,  or,  quite  equivalently,  that  there  exists 
an  affine  transformation  such  that  the  maximal  distance  relaxation 
method  applied  in  this  transformed  space  is  polynomial.  The  concept 
of  polynomality  may  be  interpreted,  in  a  somewhat  more  practical 
sense,  by  saying  that  the  methods  converge  at  a  rate  of  at  least 

(1-rrV'2. 

The  class  of  ellipsoids  which  lead  to  the  polynomiality  of 
Algorithm  2  contain  the  largest  ellipsoid  inscribed  in  (and  the 
smallest  ellipsoid  circumscribed  amount)  a  perturbation  cum  compacti- 
fication  of  the  feasible  set  P  (see  compactification  scheme  3.20), 
i.e.,  PC  where  w  e  [0,w..],  with  w  =  0  acceptable  if  dim  P  =  n, 
while  if  dim  P  <  n,  (w)-1  should  be  no  worse  than  exponential  in 
the  input  data  (that  is  w  =  w^/2,  or  w  =  w-| ,  etc.). 

All  of  this  will  be  proved  first  for  the  case  when  dim  P  =  n, 
and  then  extended  to  the  general  case. 


Lemma  7 . 1 

Let  P  =  (x  f  Rn:  Ax  <  b}  be  the  solution  set  of  a  system  of 
linear  inequalities,  with  P  assumed  to  have  a  nonempty  interior,  and 
let  Pc  =  (x  e  Rn:  lx«o»£  c)  n  P  be  a  compactification  of  P, 
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such  that  Pc  -contains  a  point  of  the  relative  interior  of  every 
face  of  P,  then  there  exists  a  nonsingular  linear  map  T  such  that 


p*(T-1P)  >  v(T-1P)  >  (t(Pc))_1  >  n'1  ; 

if  T*  is  the  linear  map  given  by  the  largest  ellipsoid  inscribed 
in  P°,  then 

p*(T;V)  >  vd^P)  >  (V’fP0))'1  >  n"1  . 

If  P  is  centrally  symmetric,  and  if  the  compact if ication  cube 
is  centered  at  the  center  of  symmetry  of  P,  then  everything  is  valid 
with  n-1/2  replacing  n~^. 

If  A  c  Zm,n  and  b  c  Zm  are  integer,  then  T#  and  T  may  be 
"approximated"  by  polynomial  space  integer  matrices  T#  and  T  which 
satisfy 

nMT-’p)  >  v(T-’p)  >  (,(PC))-’  , 

and 

uMT^P)  >  v(T ;V)  >  (^"(P0))'1  >_  n"1  —y-  , 

where  a  >  1,  and  such  that  log  a  is  polynomial  in  the  length  of  the 
input. 
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Proof: 


Using  Theorems  3.14  and  3.21,  one  has  that  for  any  nonsingular 
transformation  T: 

p*(T~  V)  >  v(T_1P)  >  (ad'V0))'1  >  (odT'V))"1  . 

If  one  takes  in  this  the  linear  transformation  T  such  that 

„j  c  Q 

o(T  P  )  =  t(P  ),  or  the  linear  transformation  T#,  given  by  the 
largest  ellipsoid  inscribed  in  P,  then  Theorem  6.4  proves  the  first 
part  of  this  theorem. 

Now,  assume  that  A  and  b  are  integers,  so  that  the  estimates 

of  Section  5  hold.  The  work  will  be  done  on  T,  but  clearly  extends 

to  T*,  or  any  map  with  similar  properties. 

We  will  assume,  without  restriction,  that  T  is  symmetric 

t  1/?  M? 

positive  definite  (if  it  is  not,  then  (IT  )  '  =  H  '  is,  and  defines 

C 

the  same  ellipsoid).  Now  we  have  e  +  TS  c  P  c  e  +  tTS.  Define  T 
as  a,  symmetric,  perturbation  of  T,  such  that  Max^  ^  (tj^-t|j|  <  e, 
where  e  >  0  is  arbitrary  for  now. 

Then 

p(T-T’)  <  n  t±j-t |  <  ne  , 

where  p  means  the  spectral  norm.  If  p^(T)  (resp.  Pi(T’)), 

I  =  1,  ...,  n,  are  the  ordered  eigenvalues  of  T  (resp.  T’),  then 

j P i ( T)  -  Pi<T* ) j  <  P(T-T')  <  ne  , 
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**^M**,.t***.&i.  <*'w4 igf' 


and  thus  T'  is  positive  definite  if  p-|(T)  >  ne  (where  p-j ( T) 
is  the  smallest  eigenvalue  of  T),  as  p^(T')  >  p^(T)  -  ne. 

In  order  to  keep  the  notation  consistent  with  the  remainder  of 
this  paper,  let 

X  =  \(H)  =  (Pl(T))2  ,  A  =  A(H)  =  (pn(T))2  , 

X'  =  X(H')  =  ( p 1 ( T * ) )2  ,  A*  =  A(H')  =  (pn(T'))2  , 

where 

H  =  T2  and  H'  =  T'2 
Now  let 

6  =  Inf (6*  >  0:  TS  +  6'S  a  T'S,  T’S  +  6'S  =  TS>  , 

be  the  Hausdorff  distance  between  the  two  ellipsoids  TS  and  T'S; 
then 

T'S  =  TS  +  6S  c  (1  +  6X_1/2)TS  , 
and 

TS  c  T'S  +  6S  c  (1  +  6(X’)'1/2)T'S  . 

Hence 

e  +  (1  +  6X'1/2)_1  T'S  =  PC  c  e  +  t(1  +  6(X')'1/2)  T'S  , 


and  thus 


•«►**■¥*  Sta***  •*-  **i  :v3&4t;  **.*#*£  -*«*'  - 


a((T')_1  P°)  <  x(1  +  6X‘1/2)  (1  ♦  6(X')‘1/2)  ; 


but,  also  (see  [22]): 


6  <  p(T-T')  £  ne  . 


1/2  -1 

Now  X  2.  w  iT  (where  -w  ^  is  the  first  negative  vertex 
level,  and  w  can  be  bounded  by  the  same  estimates  as  w^);  this 
follows  because  a  sphere  of  radius  w_i  is  contained  in  Pc,  and 
thus  also  in  e  +  xTS. 

Choose  e  =  w_i/(anx),  where  a  >  1;  then 


(X1) 


m  >  ki/2 


-  ne  w  i 


-1,, 

x  ( 1-a  ) 


6  £  w  ^ (ax) 


» 


and  thus 

oUT’)'1  PC)  £  x  . 

The  ellipsoid  e  +  TS  is  contained  in  (x  e  Rn:  ■ x ■ ^  £  c),  and 
thus  A1^2  £  n^2c/2;  but  A  >  (1/n)Tr  T2  =  (1/n)  ^  ^  t2^,  and  thus 

1 1 1 j  |  £  n^2  A^2  <  nc/2. 

If  one  chooses  0  =  | an^/w_ !  | ,  tj j  =  1 1 tjD |  and  t|j  =  t^/D, 

then 

-tij|  <  D'1  £  w_1/(anx)  • 

and  thus 
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a(T-1pC)  =  o((T')"VC) 


<  ^pc>  Si 


The  matrix  T  is  an  integer  matrix,  respresentable  in  polynomial 
space  if  a  is  representable  in  polynomial  space  (i.e.,  log  a  is 
polynomial),  as 


QED 


In  the  case  where  Ax  £  b  is  totally  unimodular,  then 
Supj  is  approximately  given  by  n*1^2  b  (if  one  takes 

a  =  2). 

Lemma  7.1  and  Theorem  5.2  lead  Immediately  to  Theorem  7.2,  which 
says  that  any  consistent  system  of  linear  inequalities  may  be  solved 
in  polynomial  time  (and  space)  by  applying  Algorithm  2,  with  some 
matrices  H,  to  the  perturbed  system  Pw  (with  w  taken  as 
w-j/2,  say,  if  P  has  no  interior,  while  w  =  0  is  satisfactory  if 
P  has  an  interior),  with  a  termination  routine  appended  to  it.  The 
matrix  H  =  TT  should  be  chosen  as  any  matrix  such  that  a  (P^)  is 
polynomial  in  n,  where  PC  is  a  compact ificat ion  (see  3.20)  of  P  ; 
the  matrix  which  defines  the  largest  ellipsoid  in  P^  is  a  most 
sensible  choice.  We  would  like  to  point  out  that  the  proof  of  this  is 
quite  natural,  and  in  fact  does  not  require  any  of  the  work  done  in 
the  later  part  of  Section  3;  but  the  resulting  algorithm,  which  solves 
a  perturbed  problem,  is  somewhat  of  a  mathematical  artifact. 


7.6 


In  Theorem  7.3  the  perturbation  is  used  at  the  level  of  the 
proof,  which  makes  it  a  bit  trickier,  as  it  requires  the  whole  of 
Section  3,  but  the  algorithm  is  much  crisper. 

Theorem  7,2 

Let  P  =  {x  e.  Rn:  (a*,x)  £  b^f  i  e  M>  be  the  solution  set  of  a 

consistent  system  of  linear  inequalities,  and  let  w  e  [0,wj] 

(where  w  =  0  is  acceptable  if  P  has  an  interior,  while,  if  not, 

-1  -1 

then  log(w)  and  log(w^-w)  should  be  polynomial;  w  =  w^/2  will 
do  in  all  cases),  then  algorithm  2,  applied  to  the  system 

P  =  (x  e  Rn:  (a*,x)  <  b.  +  8a* Bw,  i  e  M}  , 

W  “  X 

t  -1c 

using  a  matrix  H  =  TT  which  satisfies  the  fact  that  o(T  P  )  is 

Q 

polynomial  in  n  (where  P  is  a  compactif ication  of  P  ),  and 

w  w 

switching  to  the  termination  routine  4.1,  on  the  basis  of  the 
termination  criterion  f(x4)  <  wj,  will  converge  in  polynomial 
time  to  a  solution  of  P. 

A  most  sensible  choice  for  H  is  H#  =  T#T^  which  gives  the 

Q 

largest  ellipsoid  inscribed  in  P^. 

All  of  the  computations  can  be  done  by  using  (polynomial  space) 


integer  arithmetic,  provided  that  T  is  taken  as  an  integer  matrix 
(polynomial  space),  which  is  always  possible. 


HejMK-afc 


Proof: 

The  function  f  associated  to  the  system  of  linear  inequalities 

P  =  {x  e  Rn:  ( (a*,x)-b{)/ la*l  £  w,  i  e  M)  , 

is  simply  f(x)-w,  and  the  corresponding  first  vertex  level  is  w^-w. 

-1  c 

Now  let  T  be  such  that  x  =  cr(T  P  )  is  polynomial  in  n 

w 

2  -2 

(x£n  is  always  possible),  then,  following  Lemma  7.1,  X(H)  ^w  x” 
and  A(H)  _<  nc2,  where  H  =  TTt. 

Theorem  5.2  implies  that  fCx^)  <  w^,  or  equivalently  ffx^l-w 
<  w^-w,  will  be  reached  after  at  most 

2t2  £n[(A1/2(H)  x(f (0)-w) )/(X1/2(H)  (wrw))]  , 

iterations,  after  which  at  most  n  steps  of  the  termination  routine 
are  required. 

All  of  that  is  polynomial  time,  and  if  T  is  selected  as  a 
polynomial  space  integer  matrix,  which  can  be  done  by  Lemma  7.1,  then 
all  computations  can  be  performed  in  polynomial  space  (Lemma  5.1). 

QED 


Theorem  7.3 

Let  P  =  (x  c  Rn:  Ax  <  b)  be  the  solution  of  a  consistent 
system  of  linear  inequalities,  and  let  P  =  {x  e  P  :  Ixl  <  c(w)} 
(see  compactif icatlon  scheme  3.20),  where  w  e  (0,w^),  be  a  perturba 
tlon  cum  compact  if ication  of  P,  which  does  preserve  some  of  the 
relative  interior  of  every  face  of  Pw,  then  there  exists  a 
nonsingular  linear  map  T  such  that 
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g*(T_1P)  >  v(T_1Pw)  >  (x(P°))_1  >  n'1  ; 

if  T*  is  the  linear  map  given  by  the  largest  ellipsoid  inscribed 

in  Pc,  then 
w’ 

P*<T*1p)  >  v(T*1pw>  >  >  n'1  . 

If  Pw  is  centrally  symmetric,  and  if  the  compactification 
cube  is  centered  at  the  center  of  symmetry  of  Pw,  then  everything 
is  valid  with  n  “1/2  replacing  n“1. 

If  Ac  lm’n  and  b  c  Zm  are  integer  matrices,  then  T  and  T# 

/v  A/ 

may  be  "approximated”  by  polynomial  space  integer  matrices  T*.  .  and  T 
which  satisfy 

p*(T‘1P)  >  v(T_1Pw)  >  (x(P°))  >  n_1  , 

and 

K*(T;'p)  >  v(T;'pw)  >  (,"(P“))  |i|  >  n-1  , 

where  a  >  1  is  such  that  log  a  is  polynomial  in  the  length  of  the 
input. 

Proof: 

-1  c 

Let  T  be  a  nonsingular  linear  map,  and  let  x  =  o(T  P  )»  or 

x  s  o’(T_1Pc),  then  by  Theorems  3.14  and  3.21  g*(T-1P  )  >  v(T  V  ) 

x,  and  thus,  using  Theorem  6.4,  the  present  theorem  is  proved  if 
-1  -1 

p*(T  P)  >_  v(T  P  )  can  be  shown. 
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The  faces  of  T  V  are  T  V,  where  F  is  a  face  of  P,  and  the 

(face)  index  lattices  are  equal  (  (P)  =  (T_‘'P)). 

-1  -1 
Let  T  F  be  an  arbitrary  face  of  T  P,  Df  an  arbitrary 

- 1  -1 
outside  face  of  Bf(T  F,T),  the  subdifferential  of  f(y,T)  at  T  F, 

and  an  arbitrary  z  e  Dyj  Lemma  3.11  indicates  that  p*(T~V)  >  v  if 
and  only  if  for  all  such  z  one  has  I  z B  v. 

As  z  belongs  to  an  outside  face,  Lemma  3.13  (see  proof)  implies 
that  pz  i.  af(T"V,T)  for  all  p  >  1 ;  also  Izl  >  0, 

But  z  €  Dj  c  df(T~V,T)  e  Ny_ip(T  V),  and  thus,  multiplying  by 
T  t,  one  has  T  tz  e  Np(F)  (it  is  not  true  that  T~V  e  5f(F), 
because  the  lattice  </  is  not  isomorphic  under  linear  transforma¬ 
tions  followed  by  normalization). 

Using  Lemma  3.13,  there  exists  an  outside  face  D  of  <Df(F), 
with  index  set  3(D)  (clearly  3(D)  c  1(F)),  such  that  T"tz  e  X.D 

(X  >  0);  by  Lemma  3.18,  there  always  exist  a  face  F  of  P  such 

w  w 

that  I(FW)  =  3(D)  (note  also  that  Pw  is  nondegenerate),  and 
T_tz  €  Xdf(Fw)  c  Npw(Fw). 

Now,  if  F  is  a  face  of  P  ,  then  T'V  is  a  face  of  T-V  , 
w  w'  w  w 

where 

T'V  =  (y  e  Rn:  (ait:Ty-b,  -  lai»w)/ITtail  £0,  i  e  M> 

W  X 

is  understood  as  being  given  by  a  renormalized  system  of  linear 
inequalities. 


Hence  z  e  Ny-1p  (T~VW),  where 

I(T'V  )  =  1(F)  =  3(0)  c  1(F)  =  I(T'V)  . 

n  W 

Now  select  any  z'  (z1  =  X'z,  X'  >0)  which  belonqs  to  the 
subdifferential  of  T~1pw  at  T"VW,  and  thus 

z'  =  l  X.tV/ItV*  , 

i  e  3(D)  1 

with  \  =  ^ 0,  for  all  i  e  3(D);  now  Lemma  3.11  (see 

proof)  implies  that 

Hz' I  >  v(Cj_ip  (T  Vw))  2  v  . 

But  z'  also  belongs  to  bf(T  V,F),  as  3(D)  c  1(F)  =  I(T  V),  and 

_i 

hence  X'  £  1;  whence  Izi  =  (X*)  lz' I  v,  and  the  first  part  of 
the  theorem. 

The  remainder  of  the  theorem  follows  from  a  proof  similar  to  that 
given  in  Lemma  7.1.  QED 

The  fact  that  a  linear  map  T  exists  which  satisfies  p*(T  V) 
c  -1 

>  (x(P  ))  indicates,  somewhat  misleadingly,  that  algorithm  2  may 
“***  w 

converge  faster  than  linearly  only  if  t(P^)  =  1,  which  can  be  true 
only  if  P*  is  an  ellipsoid;  what  happens  is,  in  fact,  one  step 
convergence. 

And  thus  Theorem  7.3  implies  the  following  theorem,  which  is 
proved  almost  exactly  as  Theorem  7.2. 
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Theorem  7.4 


Theorem  7.2  remains  unchanged  if  Algorithm  2  is  applied  directly 
to  the,  unperturbed,  system  P;  the  only  difference  being  that 
log(w^-w)  ^  need  not  be  polynomial. 

In  fact  w  =  w-j  could  be  used,  always.  It  should  also  be 
clear  that  the  best  rate  of  convergence  which  Algorithm  2  may  achieve, 
in  theory,  is  (1  -  (-t(P^))  i.e.,  it  depends  upon  the  affine 

excentricity  of  a  perturbation  cum  compactif ication  of  P. 

If  the  system  P  were  infeasible,  then  the  solution  of  Min  f(x) 
(where  Min  f ( x )  =  w-j)  identifies  the,  normalized,  Cebysev  solution 
of  the  infeasible  system  of  linear  inequalities;  this  is  really  an 
optimization  problem  which  can  be  solved  by  subgradient  optimization, 
which  is  a  technique  differs  from  the  relaxation  method  only  by  the 
choice  of  the  step  size  [32].  A  variable  metric  subgradient  optimiza- 

Q 

tion  method,  using,  say,  the  maximum  volume  ellipsoid  included  in  ^W2 
will  also  converge  in  polynomial  time  (under  proper  choices  of  the 
step  size). 
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8.  Conclusions . 

It  has  been  shown  that  variable  metric,  maximal  ellipsoidal  dis¬ 
tance,  relaxation  algorithms  may  solve  any  system  of  linear  inequali¬ 
ties  in  polynomial  time,  which,  within  the  context  of  such  methods, 
means  fast  or  good  (or  better,  or  less  bad),  ellipsoid  matrices  which 
lead  to  polynomiality  may  be  viewed  as  inverse  Hessians,  and  are 
given,  for  instance,  by  the  maximum  volume  ellipsoid  inscribed  in  a 
perturbation  cum  compactif ication  of  the  feasible  set. 

This  method  should  probably  be  viewed  as  a  conceptual  algorithm, 
but  an  implementation  of  it  is  the  ellipsoid  method,  and  in  fact  it 
could  be  hoped  that  some  of  the  insights  gleaned  from  its  study,  may 
lead  to  improvements  of  the  ellipsoid  method. 

The  method  may  be  practical  in  problems  where  an  educated  guess 
of  the  matrix  H  may  be  accurate  enough  to  be  useful,  we  are  thinking 
about  linear  programs  derived  from  combinational  problems,  where  a  set 
of  potential  subgradients  can  be  described  a  priori  (maybe  a  simplex), 
and  could  approximate  the  set  of  subgradients  at  the  optima. 

Another  issue  of  practical  importance  is  the  possibility  of 
introducing  a  relaxation  parameter  in  algorithms  1  and  2;  this  does 
not  affect  the  theory  given  here,  but,  in  practice,  it  has  always  led 
to  significant  improvements  in  the  rate  of  convergence  of  the  method, 
and  we  will  conjecture  that  if  one  used  the  optimal  variable  metric, 
and  the  optimal  relaxation  parameter,  then  the  rate  of  convergence 
would  be  of  the  order  of  (1  -  n~ 
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A  somewhat  major  annoyance  is  that  the  theory  given  here  does  not 

extend  to  other  implementations  of  the  relaxation  method,  like  the 

maximal  residual  relaxation  method. 

The  main  difficulty  is  that  the  perturbed  sets  P  may  behave 

w 

poorly  as  w  increases,  and  thus  the  proofs  given  here  do  not  seem  to 
extend  to  that  case. 
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